Project 7 - Bank Churn Prediction¶

Description¶

Background and Context¶

Businesses like banks that provide service have to worry about the problem of Churn i.e. customers leaving and joining another service provider. It is important to understand which aspects of the service influence a customer's decision in this regard. Management can concentrate efforts on the improvement of service, keeping in mind these priorities.

Objective¶

Given a Bank customer, build a neural network-based classifier that can determine whether they will leave or not in the next 6 months.

Data Description¶

The case study is from an open-source dataset from Kaggle. The dataset contains 10,000 sample points with 14 distinct features such as CustomerId, CreditScore, Geography, Gender, Age, Tenure, Balance, etc.

Data Dictionary¶

CustomerId: Unique ID which is assigned to each customer Surname: Last name of the customer CreditScore: It defines the credit history of the customer.
Geography: A customer’s location
Gender: It defines the Gender of the customer
Age: Age of the customer
Tenure: Number of years for which the customer has been with the bank NumOfProducts: It refers to the number of products that a customer has purchased through the bank. Balance: Account balance HasCrCard: It is a categorical variable that decides whether the customer has a credit card or not. EstimatedSalary: Estimated salary isActiveMember: It is a categorical variable that decides whether the customer is an active member of the bank or not ( Active member in the sense, using bank products regularly, making transactions, etc ) Exited: It is a categorical variable that decides whether the customer left the bank within six months or not. It can take two values
0=No ( Customer did not leave the bank )
1=Yes ( Customer left the bank )

Loading the required libraries

In [77]:
# Import the libraries needed. If not installed already, install them first using !pip install <library name>
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# libaries to help with data visualization
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
# Library to encode the variables
from sklearn.preprocessing import OneHotEncoder
 # Library to split data
from sklearn.model_selection import train_test_split  
# library to import different optimizers
from tensorflow.keras import optimizers
# Library to import different loss functions 
from tensorflow.keras import losses
from tensorflow.keras.layers import Dense
# Library to avoid the warnings 
import warnings
warnings.filterwarnings('ignore')
# importing keras library
from tensorflow import keras
# library to convert the target variables to numpy arrays
from tensorflow.keras.utils import to_categorical
# library to plot classification report
from sklearn.metrics import classification_report
from sklearn.preprocessing import StandardScaler
import random
import tensorflow as tf

# library to import to standardize the data
from sklearn.preprocessing import MinMaxScaler
# importing different functions to build models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
# To get classification report
from sklearn.metrics import classification_report
# To plot confusion matrix
from sklearn.metrics import confusion_matrix
from tensorflow.keras import backend

from sklearn.metrics import (
    f1_score,
    accuracy_score,
    recall_score,
    precision_score,
    confusion_matrix,
    roc_auc_score,
    plot_confusion_matrix,
    precision_recall_curve,
    roc_curve,
    make_scorer,
)
from pandas.core.algorithms import value_counts
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler

Mounting the Drive

In [2]:
# Connect with the Google drive so that it can read the data file.
from google.colab import drive
drive.mount('/content/drive/')

# Read the data file and take a sneek peak
data = pd.read_csv("/content/drive/My Drive/BankChurn.csv")
data.head()
Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).
Out[2]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1
1 2 15647311 Hill 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0
2 3 15619304 Onio 502 France Female 42 8 159660.80 3 1 0 113931.57 1
3 4 15701354 Boni 699 France Female 39 1 0.00 2 0 0 93826.63 0
4 5 15737888 Mitchell 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0
In [3]:
# Let's look at the basic row/column count
data.shape
Out[3]:
(10000, 14)
In [4]:
# Let's look at basic non-null counts and data types
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB
In [5]:
# Let's look at the number of distinct values of the columns
data.nunique()
Out[5]:
RowNumber          10000
CustomerId         10000
Surname             2932
CreditScore          460
Geography              3
Gender                 2
Age                   70
Tenure                11
Balance             6382
NumOfProducts          4
HasCrCard              2
IsActiveMember         2
EstimatedSalary     9999
Exited                 2
dtype: int64

RowNumber and CustomerId are unique. They can be dropped. Surname is also not a useful field for model building. It can be dropped as well.

In [6]:
# Check how many fields have NA
data.isna().sum()
Out[6]:
RowNumber          0
CustomerId         0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64
In [7]:
# Check how many fields have nulls
data.isnull().sum()
Out[7]:
RowNumber          0
CustomerId         0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64
In [8]:
# Check for duplicate rows now
data.duplicated(keep=False).sum()
Out[8]:
0
In [9]:
# Dropping columns that won't be part of the modeling
data = data.drop(['RowNumber', 'CustomerId','Surname'], axis = 1)
data.head()
Out[9]:
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 619 France Female 42 2 0.00 1 1 1 101348.88 1
1 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0
2 502 France Female 42 8 159660.80 3 1 0 113931.57 1
3 699 France Female 39 1 0.00 2 0 0 93826.63 0
4 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0
In [10]:
# Let's confirm the changes
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   CreditScore      10000 non-null  int64  
 1   Geography        10000 non-null  object 
 2   Gender           10000 non-null  object 
 3   Age              10000 non-null  int64  
 4   Tenure           10000 non-null  int64  
 5   Balance          10000 non-null  float64
 6   NumOfProducts    10000 non-null  int64  
 7   HasCrCard        10000 non-null  int64  
 8   IsActiveMember   10000 non-null  int64  
 9   EstimatedSalary  10000 non-null  float64
 10  Exited           10000 non-null  int64  
dtypes: float64(2), int64(7), object(2)
memory usage: 859.5+ KB

Exploratory Data Analysis¶

Univariate Analysis¶

In [11]:
# Let's check the summary statistics of the data
data.describe().T
Out[11]:
count mean std min 25% 50% 75% max
CreditScore 10000.0 650.528800 96.653299 350.00 584.00 652.000 718.0000 850.00
Age 10000.0 38.921800 10.487806 18.00 32.00 37.000 44.0000 92.00
Tenure 10000.0 5.012800 2.892174 0.00 3.00 5.000 7.0000 10.00
Balance 10000.0 76485.889288 62397.405202 0.00 0.00 97198.540 127644.2400 250898.09
NumOfProducts 10000.0 1.530200 0.581654 1.00 1.00 1.000 2.0000 4.00
HasCrCard 10000.0 0.705500 0.455840 0.00 0.00 1.000 1.0000 1.00
IsActiveMember 10000.0 0.515100 0.499797 0.00 0.00 1.000 1.0000 1.00
EstimatedSalary 10000.0 100090.239881 57510.492818 11.58 51002.11 100193.915 149388.2475 199992.48
Exited 10000.0 0.203700 0.402769 0.00 0.00 0.000 0.0000 1.00
In [12]:
# EDA functions that will help us investigate the data
def histogram_boxplot(data, feature, figsize=(15, 10), kde=False, bins=None):
    """
    Boxplot and histogram combined

    data: dataframe
    feature: dataframe column
    figsize: size of figure (default (15,10))
    kde: whether to show the density curve (default False)
    bins: number of bins for histogram (default None)
    """
    import warnings
    warnings.filterwarnings("ignore")
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    sns.set(color_codes=True) 
    import sys, os
    #sys.path.append('/Users/Dhivya/Documents/Dhivya/Downloads') 

    f2, (ax_box2, ax_hist2) = plt.subplots(
        nrows=2,  # Number of rows of the subplot grid= 2
        sharex=True,  # x-axis will be shared among all subplots
        gridspec_kw={"height_ratios": (0.25, 0.75)},
        figsize=figsize,
    )  # creating the 2 subplots
    sns.boxplot(
        data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
    )  # boxplot will be created and a star will indicate the mean value of the column
    sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
    ) if bins else sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2
    )  # For histogram
    ax_hist2.axvline(
        data[feature].mean(), color="green", linestyle="--"
    )  # Add mean to the histogram
    ax_hist2.axvline(
        data[feature].median(), color="black", linestyle="-"
    )  # Add median to the histogram

# function to create labeled barplots


def labeled_barplot(data, feature, perc=False, n=None):
    """
    Barplot with percentage at the top

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    """
    import warnings
    warnings.filterwarnings("ignore")
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    sns.set(color_codes=True)  
    import sys, os
    #sys.path.append('/Users/Dhivya/Documents/Dhivya/Downloads')

    total = len(data[feature])  # length of the column
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 2, 6))
    else:
        plt.figure(figsize=(n + 2, 6))

    plt.xticks(rotation=90, fontsize=15)
    ax = sns.countplot(
        data=data,
        x=feature,
        palette="PuBu",
        order=data[feature].value_counts(ascending=True).index[:n], 
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  # annotate the percentage

    plt.show()  # show the plot

def stacked_barplot(data, predictor, target):
    """
    Print the category counts and plot a stacked bar chart

    data: dataframe
    predictor: independent variable
    target: target variable
    """
    import warnings
    warnings.filterwarnings("ignore")
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    sns.set(color_codes=True)  
    import sys, os
    #sys.path.append('/Users/Dhivya/Documents/Dhivya/Downloads')
    count = data[predictor].nunique()
    sorter = data[target].value_counts().index[-1] # Sort by True
    tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
        by=sorter, ascending=False
    )
    print(tab1)
    print("-" * 50)
    tab = pd.crosstab(data[predictor], data[target], margins=True,normalize="index").sort_values(
        by=sorter, ascending=False
    ).round(4)*100
    print(tab)
    print("-" * 50)

    tab.plot(kind="bar", stacked=True, figsize=(count + 5, 5))
    #plt.legend(
        #loc="lower left", frameon=True,
    #)
    plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
    plt.show()


def confusion_matrix_sklearn(model, predictors, target):
    """
    To plot the confusion_matrix with percentages

    model: classifier
    predictors: independent variables
    target: dependent variable
    """
    y_pred = model.predict(predictors)
    cm = confusion_matrix(target, y_pred)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(2, 2)

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")


# The functions below display the model information and are taken from the code shared in the weekly sessions.
# defining a function to compute different metrics to check performance of a classification model built using sklearn


def model_performance_classification_sklearn_with_threshold(
    model, predictors, target, threshold=0.5
):
    """
    Function to compute different metrics, based on the threshold specified, to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """

    # predicting using the independent variables
    pred_prob = model.predict_proba(predictors)[:, 1]
    pred_thres = pred_prob > threshold
    pred = np.round(pred_thres)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred)  # to compute Recall
    precision = precision_score(target, pred)  # to compute Precision
    f1 = f1_score(target, pred)  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {
            "Accuracy": acc,
            "Recall": recall,
            "Precision": precision,
            "F1": f1,
        },
        index=[0],
    )

    return df_perf

Count plot for categorical variables¶

Geography

In [13]:
labeled_barplot(data, "Geography", perc=True)
  • Most of the customers are from France. The rest of the customers are evenly split between Spain and Germany.

Gender

In [14]:
labeled_barplot(data, "Gender", perc=True)
  • Majority of the customers are male. But the difference is not huge.

Distribution of numerical variables¶

In [15]:
# Call the above defined utility function for each numerical column
for col in data.select_dtypes(include=['int64','float64']).columns:
  histogram_boxplot(data, col)

Observations¶

  • CreditScore - This has a normal distribution. There are some outliers on the low end. There is also a spike in customer count with the maximum credit score.
  • Age - This has a somewhat right skewed distribution with many outliers on the high value end.
  • Tenure - It is a fairly uniformly distributed.
  • Balance - There are many zero balance customers. Barring this, it has a normal distribution.
  • NumOfProducts - Most customers have one or two products. A much smaller number of customers have 3 or 4 products.
  • HasCrCard - 70% customers have a credit card and 30% do not.
  • IsActiveMember - this is even spilt between 0 and 1.
  • EstimatedSalary - this is uniformly distributed between 11 and 199K. There isn't skew to justify doing a log function on this attribute.
  • Exited - This is the target variable and 80% have a 0 and 20% have the value of 1. So this is an unbalanced distribution. So we may employ under-sampling and/or over-sampling to handle this.

Bivariate analysis¶

In [16]:
# Let's look at the correlation values of numerical attributes.
plt.figure(figsize=(15, 7))
sns.heatmap(
    data.corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral"
)
plt.show()
  • Age has a weak positive correlation with Exited.
  • Balance has a negative correlation with NumOfProducts.
In [17]:
# Let's look at the distribution of numerical attributes showing a breakdown by the values of the categorical attributes.

ccol = data.select_dtypes(include=['object']).columns
ncol = data.select_dtypes(include=['int64','float64']).columns

for c1 in ccol:
  for c2 in ncol:
    plt.figure(figsize=(15,8))
    a=sns.boxplot(x=c2,y=c1,data=data,orient="h")
In [18]:
# Let's look at the distribution of the numerical attributes for those attributes that have 0/1 values.
ccol = ['HasCrCard','IsActiveMember','Exited']
ncol = ['CreditScore','Age','Tenure','Balance','NumOfProducts','EstimatedSalary']

for c1 in ccol:
  for c2 in ncol:
    plt.figure(figsize=(15,8))
    a=sns.boxplot(x=c2,y=c1,data=data,orient="h")
In [19]:
# Let's look at the stacked bar plots to continue the bivariate analysis
hcol = 'Exited'
vcols = ['Geography','Gender','Tenure','NumOfProducts','HasCrCard','IsActiveMember']
for vcol in vcols:
  stacked_barplot(data,vcol,hcol)
Exited        0     1    All
Geography                   
All        7963  2037  10000
Germany    1695   814   2509
France     4204   810   5014
Spain      2064   413   2477
--------------------------------------------------
Exited         0      1
Geography              
Germany    67.56  32.44
All        79.63  20.37
Spain      83.33  16.67
France     83.85  16.15
--------------------------------------------------
Exited     0     1    All
Gender                   
All     7963  2037  10000
Female  3404  1139   4543
Male    4559   898   5457
--------------------------------------------------
Exited      0      1
Gender              
Female  74.93  25.07
All     79.63  20.37
Male    83.54  16.46
--------------------------------------------------
Exited     0     1    All
Tenure                   
All     7963  2037  10000
1        803   232   1035
3        796   213   1009
9        771   213    984
5        803   209   1012
4        786   203    989
2        847   201   1048
8        828   197   1025
6        771   196    967
7        851   177   1028
10       389   101    490
0        318    95    413
--------------------------------------------------
Exited      0      1
Tenure              
0       77.00  23.00
1       77.58  22.42
9       78.35  21.65
3       78.89  21.11
5       79.35  20.65
10      79.39  20.61
4       79.47  20.53
All     79.63  20.37
6       79.73  20.27
8       80.78  19.22
2       80.82  19.18
7       82.78  17.22
--------------------------------------------------
Exited            0     1    All
NumOfProducts                   
All            7963  2037  10000
1              3675  1409   5084
2              4242   348   4590
3                46   220    266
4                 0    60     60
--------------------------------------------------
Exited             0       1
NumOfProducts               
4               0.00  100.00
3              17.29   82.71
1              72.29   27.71
All            79.63   20.37
2              92.42    7.58
--------------------------------------------------
Exited        0     1    All
HasCrCard                   
All        7963  2037  10000
1          5631  1424   7055
0          2332   613   2945
--------------------------------------------------
Exited         0      1
HasCrCard              
0          79.19  20.81
All        79.63  20.37
1          79.82  20.18
--------------------------------------------------
Exited             0     1    All
IsActiveMember                   
All             7963  2037  10000
0               3547  1302   4849
1               4416   735   5151
--------------------------------------------------
Exited              0      1
IsActiveMember              
0               73.15  26.85
All             79.63  20.37
1               85.73  14.27
--------------------------------------------------
In [20]:
# Let's get the 5 point summary and see how they vary for customers based on the target variable
data[data['Exited']==0].describe().T
Out[20]:
count mean std min 25% 50% 75% max
CreditScore 7963.0 651.853196 95.653837 405.00 585.00 653.00 718.000 850.00
Age 7963.0 37.408389 10.125363 18.00 31.00 36.00 41.000 92.00
Tenure 7963.0 5.033279 2.880658 0.00 3.00 5.00 7.000 10.00
Balance 7963.0 72745.296779 62848.040701 0.00 0.00 92072.68 126410.280 221532.80
NumOfProducts 7963.0 1.544267 0.509536 1.00 1.00 2.00 2.000 3.00
HasCrCard 7963.0 0.707146 0.455101 0.00 0.00 1.00 1.000 1.00
IsActiveMember 7963.0 0.554565 0.497045 0.00 0.00 1.00 1.000 1.00
EstimatedSalary 7963.0 99738.391772 57405.586966 90.07 50783.49 99645.04 148609.955 199992.48
Exited 7963.0 0.000000 0.000000 0.00 0.00 0.00 0.000 0.00
In [21]:
data[data['Exited']==1].describe().T
Out[21]:
count mean std min 25% 50% 75% max
CreditScore 2037.0 645.351497 100.321503 350.00 578.00 646.00 716.00 850.00
Age 2037.0 44.837997 9.761562 18.00 38.00 45.00 51.00 84.00
Tenure 2037.0 4.932744 2.936106 0.00 2.00 5.00 8.00 10.00
Balance 2037.0 91108.539337 58360.794816 0.00 38340.02 109349.29 131433.33 250898.09
NumOfProducts 2037.0 1.475209 0.801521 1.00 1.00 1.00 2.00 4.00
HasCrCard 2037.0 0.699067 0.458776 0.00 0.00 1.00 1.00 1.00
IsActiveMember 2037.0 0.360825 0.480358 0.00 0.00 0.00 1.00 1.00
EstimatedSalary 2037.0 101465.677531 57912.418071 11.58 51907.72 102460.84 152422.91 199808.10
Exited 2037.0 1.000000 0.000000 1.00 1.00 1.00 1.00 1.00

Observations¶

  • There is a small difference in the median age across geography. The median age in Germany is greater than in Spain, which is in turn greater than in France.
  • Age is higher for those customers that have exited.
  • The median balance in Germany is much higher than in Spain and France.
  • Balance is higher for customers that exited.
  • The distribution of tenure is different for males and females, with higher values for the former.
  • Tenure is higher for the customers with credit card.
  • Tenure is lower for the active customers.
  • The target variable (Exited) distribution varies by country, with the highest % of customers having exited in Germany and the lowest in France.
  • Its distribution varies by gender, with a higher % of female customers having exited than male customers.
  • Its distribution varies by the number of products owned, with a higher exit rate with higher number of products.
  • Its distribution varies by whether the customer is active, with a higher exit rate for inactive customers.
In [22]:
# Let's continute the bivariate analysis using scatterplots.
sns.pairplot(data=data, diag_kind="kde")
plt.show()
In [23]:
# Let's extend the above by using a hue attribute.
ncols = ['CreditScore','Age','Tenure','Balance','NumOfProducts','EstimatedSalary']
hcols = ['Exited','HasCrCard','IsActiveMember']

for c1 in ncols:
  for c2 in ncols:
    if(c1>c2):
      for hcol in hcols:
        plt.figure(figsize = (15, 8))
        sns.scatterplot(data = data,x = c1,y = c2,hue = hcol)
        plt.show()
  • Our earlier observations on the exit rate and age can be seen in these plots as well. No significant additional insights obtained here that wasn't already mentioned earlier.

Data Preparation¶

This dataset contains both numerical and categorical variables. We need to treat them before we pass them onto the neural network. We will perform the below pre-processing steps:

  • One hot encoding of categorical variables
  • Scaling numerical variables

An important point to remember: Before we scale numerical variables, we would first split the dataset into train and test datasets and perform scaling separately. Otherwise, we would be leaking information from the test data to the train data and the resulting model might give a false sense of good performance. This is known as data leakage which we would like to avoid.

In [24]:
X = data.drop(['Exited'],axis=1)
y = data[['Exited']]
In [25]:
print(X.shape)
print(y.shape)
(10000, 10)
(10000, 1)
In [26]:
# Let's do one hot encoding of categorical variables. Note that this step does not cause data leakage and hence can be done before splitting.
cat_data = ['Geography','Gender']
X = pd.get_dummies(X,columns=cat_data,drop_first= True)

Now, let's split the dataset into train and test datasets. To do that, we have already extracted all the independent variables and saved them into a variable X. And the target variable Exited are saved into a variable y. These two variables will be used to split the parent dataset into train and test datasets. The size of the dataset is small and the Keras implementation provides an argument for selecting some percentage of training data as validation data to check the accuracy of the model. Therefore, we will split the data into an 80:20 ratio.

In [27]:
# Splitting the dataset into the train (80%) and the test data (20%)
X_train, X_test, y_train, y_test =  train_test_split(X, y, test_size = 0.2, random_state = 5)
In [28]:
print(X.shape)
print(y.shape)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
(10000, 11)
(10000, 1)
(8000, 11)
(8000, 1)
(2000, 11)
(2000, 1)

Now, we will perform scaling on the numerical variables separately for train and test sets. We will perform fit and transform on the train data and then we will only perform transform on the test data.

In [29]:
## Scaling the data. Since y is categorical (0/1 values), we don't need to rescale y.
sc=StandardScaler()
temp = sc.fit(X_train)
X_train = temp.transform(X_train)
X_test = temp.transform(X_test)
In [30]:
print(X.shape)
print(y.shape)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
(10000, 11)
(10000, 1)
(8000, 11)
(8000, 1)
(2000, 11)
(2000, 1)

First, let's set the seed for random number generators in NumPy, Python, and TensorFlow to be able to reproduce the same results every time we run the code.

In [31]:
# Fixing the seed for random number generators
np.random.seed(5)
random.seed(5)
tf.random.set_seed(5)

Model Building¶

In neural networks, there are so many hyper-parameters that we can play around with and tune the network to get the best results. Some of them are:

  1. Number of hidden layers
  2. Number of neurons in each hidden layer
  3. Activation functions in hidden layers
  4. Optimizers
  5. Random initialization of weights and biases
  6. Batch size
  7. Learning rate
  8. Early stopping
  9. L1 and L2 Regularization
  10. Dropout
  11. Momentum

and so on.

Let's build a feed-forward neural network with 2 hidden layers and the output layer.

In [32]:
# We will be adding the layers sequentially
model_1 = Sequential()

# First hidden layer with 128 neurons and relu activation function, the input_shape tuple denotes number of independent variables
model_1.add(Dense(128, activation = 'relu', input_shape = (11, )))

# Second hidden layer with 64 neurons and relu activation function
model_1.add(Dense(64, activation = 'relu'))

# Output layer with only one neuron and sigmoid as activation function which will give the probability of customer exiting
model_1.add(Dense(1, activation = 'sigmoid'))

Once we are done with the model architecture, we need to compile the model, where we need to provide the loss function that we want to optimize, the optimization algorithm, and the evaluation metric that we are interested in to evaluate the model.
Since this is a binary classification task, we will be minimizing the binary_crossentropy and we can choose one optimizer out of

  1. SGD
  2. RMSprop
  3. Adam

This is a hyper-parameter. You can play around with these optimizers to check which one performs better with a particular data.

For now, let's try SGD optimizer with Recall as the metric and see the model's summary.

In the case of the Bank, both precision and recall are important. A false positive would lead to unnecessary promotional expenses and a false negative would lead to missed revenue opportunity. However, the cost of missed revenue is much bigger than the marginal promotional experience. Hence Recall is the chosen metric to optimize for.

In [33]:
# Compliling the model with binary crossentropy as loss, SGD as optimizer and accuracy as metrics
#model_1.compile(loss = 'binary_crossentropy', optimizer = 'SGD', metrics = ['accuracy'])
model_1.compile(loss = 'binary_crossentropy', optimizer = 'SGD', metrics=[tf.keras.metrics.Recall()])

# Printing the summary of the model
model_1.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 128)               1536      
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 dense_2 (Dense)             (None, 1)                 65        
                                                                 
=================================================================
Total params: 9,857
Trainable params: 9,857
Non-trainable params: 0
_________________________________________________________________

From the above summary, we can see that this architecture will train a total of 9,857 parameters, i.e., weights and biases in the network.

Training the model¶

Let's now train the model using the below piece of code. We will keep 10% of the training data for validation.

In [34]:
history_1 = model_1.fit(X_train, y_train,validation_split = 0.1,epochs = 150,verbose = 2)
Epoch 1/150
225/225 - 1s - loss: 0.5418 - recall: 0.0657 - val_loss: 0.4740 - val_recall: 0.0000e+00 - 1s/epoch - 5ms/step
Epoch 2/150
225/225 - 0s - loss: 0.4694 - recall: 0.0014 - val_loss: 0.4435 - val_recall: 0.0129 - 407ms/epoch - 2ms/step
Epoch 3/150
225/225 - 0s - loss: 0.4456 - recall: 0.0460 - val_loss: 0.4248 - val_recall: 0.0903 - 370ms/epoch - 2ms/step
Epoch 4/150
225/225 - 0s - loss: 0.4307 - recall: 0.1131 - val_loss: 0.4119 - val_recall: 0.1677 - 347ms/epoch - 2ms/step
Epoch 5/150
225/225 - 0s - loss: 0.4206 - recall: 0.1774 - val_loss: 0.4021 - val_recall: 0.2258 - 384ms/epoch - 2ms/step
Epoch 6/150
225/225 - 0s - loss: 0.4130 - recall: 0.2261 - val_loss: 0.3945 - val_recall: 0.2839 - 379ms/epoch - 2ms/step
Epoch 7/150
225/225 - 0s - loss: 0.4066 - recall: 0.2729 - val_loss: 0.3868 - val_recall: 0.2839 - 342ms/epoch - 2ms/step
Epoch 8/150
225/225 - 0s - loss: 0.4004 - recall: 0.2993 - val_loss: 0.3799 - val_recall: 0.2968 - 386ms/epoch - 2ms/step
Epoch 9/150
225/225 - 0s - loss: 0.3946 - recall: 0.3128 - val_loss: 0.3739 - val_recall: 0.3677 - 379ms/epoch - 2ms/step
Epoch 10/150
225/225 - 0s - loss: 0.3888 - recall: 0.3480 - val_loss: 0.3665 - val_recall: 0.3419 - 348ms/epoch - 2ms/step
Epoch 11/150
225/225 - 0s - loss: 0.3834 - recall: 0.3453 - val_loss: 0.3605 - val_recall: 0.3871 - 338ms/epoch - 2ms/step
Epoch 12/150
225/225 - 0s - loss: 0.3779 - recall: 0.3676 - val_loss: 0.3540 - val_recall: 0.3677 - 339ms/epoch - 2ms/step
Epoch 13/150
225/225 - 0s - loss: 0.3727 - recall: 0.3859 - val_loss: 0.3476 - val_recall: 0.3742 - 355ms/epoch - 2ms/step
Epoch 14/150
225/225 - 0s - loss: 0.3681 - recall: 0.3771 - val_loss: 0.3431 - val_recall: 0.4258 - 380ms/epoch - 2ms/step
Epoch 15/150
225/225 - 0s - loss: 0.3637 - recall: 0.3988 - val_loss: 0.3372 - val_recall: 0.4065 - 350ms/epoch - 2ms/step
Epoch 16/150
225/225 - 0s - loss: 0.3599 - recall: 0.4076 - val_loss: 0.3329 - val_recall: 0.3806 - 360ms/epoch - 2ms/step
Epoch 17/150
225/225 - 0s - loss: 0.3566 - recall: 0.4218 - val_loss: 0.3290 - val_recall: 0.3806 - 380ms/epoch - 2ms/step
Epoch 18/150
225/225 - 0s - loss: 0.3534 - recall: 0.4259 - val_loss: 0.3250 - val_recall: 0.4065 - 337ms/epoch - 1ms/step
Epoch 19/150
225/225 - 0s - loss: 0.3508 - recall: 0.4367 - val_loss: 0.3223 - val_recall: 0.4387 - 352ms/epoch - 2ms/step
Epoch 20/150
225/225 - 0s - loss: 0.3483 - recall: 0.4469 - val_loss: 0.3201 - val_recall: 0.4065 - 329ms/epoch - 1ms/step
Epoch 21/150
225/225 - 0s - loss: 0.3464 - recall: 0.4462 - val_loss: 0.3178 - val_recall: 0.4258 - 334ms/epoch - 1ms/step
Epoch 22/150
225/225 - 0s - loss: 0.3445 - recall: 0.4550 - val_loss: 0.3157 - val_recall: 0.4258 - 397ms/epoch - 2ms/step
Epoch 23/150
225/225 - 0s - loss: 0.3428 - recall: 0.4543 - val_loss: 0.3149 - val_recall: 0.4710 - 391ms/epoch - 2ms/step
Epoch 24/150
225/225 - 0s - loss: 0.3414 - recall: 0.4597 - val_loss: 0.3129 - val_recall: 0.4516 - 389ms/epoch - 2ms/step
Epoch 25/150
225/225 - 0s - loss: 0.3401 - recall: 0.4617 - val_loss: 0.3115 - val_recall: 0.4387 - 379ms/epoch - 2ms/step
Epoch 26/150
225/225 - 0s - loss: 0.3391 - recall: 0.4672 - val_loss: 0.3103 - val_recall: 0.4387 - 357ms/epoch - 2ms/step
Epoch 27/150
225/225 - 0s - loss: 0.3379 - recall: 0.4685 - val_loss: 0.3092 - val_recall: 0.4516 - 377ms/epoch - 2ms/step
Epoch 28/150
225/225 - 1s - loss: 0.3366 - recall: 0.4665 - val_loss: 0.3111 - val_recall: 0.5097 - 669ms/epoch - 3ms/step
Epoch 29/150
225/225 - 1s - loss: 0.3359 - recall: 0.4766 - val_loss: 0.3078 - val_recall: 0.4645 - 750ms/epoch - 3ms/step
Epoch 30/150
225/225 - 0s - loss: 0.3351 - recall: 0.4733 - val_loss: 0.3081 - val_recall: 0.4645 - 374ms/epoch - 2ms/step
Epoch 31/150
225/225 - 0s - loss: 0.3340 - recall: 0.4780 - val_loss: 0.3075 - val_recall: 0.4645 - 395ms/epoch - 2ms/step
Epoch 32/150
225/225 - 0s - loss: 0.3334 - recall: 0.4766 - val_loss: 0.3073 - val_recall: 0.4645 - 374ms/epoch - 2ms/step
Epoch 33/150
225/225 - 0s - loss: 0.3326 - recall: 0.4780 - val_loss: 0.3067 - val_recall: 0.4581 - 385ms/epoch - 2ms/step
Epoch 34/150
225/225 - 0s - loss: 0.3315 - recall: 0.4827 - val_loss: 0.3070 - val_recall: 0.4452 - 384ms/epoch - 2ms/step
Epoch 35/150
225/225 - 0s - loss: 0.3311 - recall: 0.4787 - val_loss: 0.3067 - val_recall: 0.4581 - 354ms/epoch - 2ms/step
Epoch 36/150
225/225 - 0s - loss: 0.3303 - recall: 0.4800 - val_loss: 0.3058 - val_recall: 0.4968 - 393ms/epoch - 2ms/step
Epoch 37/150
225/225 - 0s - loss: 0.3298 - recall: 0.4800 - val_loss: 0.3054 - val_recall: 0.4645 - 383ms/epoch - 2ms/step
Epoch 38/150
225/225 - 0s - loss: 0.3290 - recall: 0.4848 - val_loss: 0.3050 - val_recall: 0.4774 - 338ms/epoch - 2ms/step
Epoch 39/150
225/225 - 0s - loss: 0.3286 - recall: 0.4807 - val_loss: 0.3056 - val_recall: 0.4968 - 359ms/epoch - 2ms/step
Epoch 40/150
225/225 - 0s - loss: 0.3278 - recall: 0.4922 - val_loss: 0.3044 - val_recall: 0.4710 - 332ms/epoch - 1ms/step
Epoch 41/150
225/225 - 0s - loss: 0.3273 - recall: 0.4834 - val_loss: 0.3050 - val_recall: 0.4581 - 375ms/epoch - 2ms/step
Epoch 42/150
225/225 - 0s - loss: 0.3266 - recall: 0.4909 - val_loss: 0.3046 - val_recall: 0.4645 - 374ms/epoch - 2ms/step
Epoch 43/150
225/225 - 0s - loss: 0.3259 - recall: 0.4848 - val_loss: 0.3048 - val_recall: 0.4968 - 335ms/epoch - 1ms/step
Epoch 44/150
225/225 - 0s - loss: 0.3256 - recall: 0.4949 - val_loss: 0.3042 - val_recall: 0.4710 - 370ms/epoch - 2ms/step
Epoch 45/150
225/225 - 0s - loss: 0.3251 - recall: 0.4936 - val_loss: 0.3050 - val_recall: 0.4710 - 347ms/epoch - 2ms/step
Epoch 46/150
225/225 - 0s - loss: 0.3243 - recall: 0.4976 - val_loss: 0.3047 - val_recall: 0.4710 - 342ms/epoch - 2ms/step
Epoch 47/150
225/225 - 0s - loss: 0.3237 - recall: 0.4922 - val_loss: 0.3041 - val_recall: 0.4903 - 388ms/epoch - 2ms/step
Epoch 48/150
225/225 - 0s - loss: 0.3232 - recall: 0.4929 - val_loss: 0.3048 - val_recall: 0.4774 - 340ms/epoch - 2ms/step
Epoch 49/150
225/225 - 0s - loss: 0.3226 - recall: 0.4922 - val_loss: 0.3046 - val_recall: 0.4774 - 381ms/epoch - 2ms/step
Epoch 50/150
225/225 - 0s - loss: 0.3220 - recall: 0.4970 - val_loss: 0.3035 - val_recall: 0.4774 - 392ms/epoch - 2ms/step
Epoch 51/150
225/225 - 0s - loss: 0.3215 - recall: 0.4963 - val_loss: 0.3053 - val_recall: 0.4903 - 349ms/epoch - 2ms/step
Epoch 52/150
225/225 - 0s - loss: 0.3211 - recall: 0.4963 - val_loss: 0.3033 - val_recall: 0.4968 - 369ms/epoch - 2ms/step
Epoch 53/150
225/225 - 0s - loss: 0.3207 - recall: 0.5071 - val_loss: 0.3033 - val_recall: 0.4516 - 374ms/epoch - 2ms/step
Epoch 54/150
225/225 - 0s - loss: 0.3201 - recall: 0.4956 - val_loss: 0.3045 - val_recall: 0.4581 - 374ms/epoch - 2ms/step
Epoch 55/150
225/225 - 0s - loss: 0.3198 - recall: 0.4976 - val_loss: 0.3035 - val_recall: 0.4839 - 380ms/epoch - 2ms/step
Epoch 56/150
225/225 - 0s - loss: 0.3192 - recall: 0.5030 - val_loss: 0.3039 - val_recall: 0.4839 - 374ms/epoch - 2ms/step
Epoch 57/150
225/225 - 0s - loss: 0.3186 - recall: 0.4983 - val_loss: 0.3028 - val_recall: 0.4903 - 342ms/epoch - 2ms/step
Epoch 58/150
225/225 - 0s - loss: 0.3183 - recall: 0.5146 - val_loss: 0.3041 - val_recall: 0.4645 - 357ms/epoch - 2ms/step
Epoch 59/150
225/225 - 0s - loss: 0.3180 - recall: 0.4997 - val_loss: 0.3029 - val_recall: 0.4774 - 383ms/epoch - 2ms/step
Epoch 60/150
225/225 - 0s - loss: 0.3172 - recall: 0.5030 - val_loss: 0.3037 - val_recall: 0.4774 - 340ms/epoch - 2ms/step
Epoch 61/150
225/225 - 0s - loss: 0.3166 - recall: 0.5152 - val_loss: 0.3043 - val_recall: 0.4452 - 351ms/epoch - 2ms/step
Epoch 62/150
225/225 - 0s - loss: 0.3164 - recall: 0.5017 - val_loss: 0.3043 - val_recall: 0.4774 - 372ms/epoch - 2ms/step
Epoch 63/150
225/225 - 0s - loss: 0.3157 - recall: 0.5037 - val_loss: 0.3027 - val_recall: 0.4645 - 365ms/epoch - 2ms/step
Epoch 64/150
225/225 - 0s - loss: 0.3156 - recall: 0.5051 - val_loss: 0.3026 - val_recall: 0.4710 - 393ms/epoch - 2ms/step
Epoch 65/150
225/225 - 0s - loss: 0.3148 - recall: 0.5024 - val_loss: 0.3041 - val_recall: 0.4839 - 350ms/epoch - 2ms/step
Epoch 66/150
225/225 - 0s - loss: 0.3144 - recall: 0.5071 - val_loss: 0.3050 - val_recall: 0.4839 - 339ms/epoch - 2ms/step
Epoch 67/150
225/225 - 0s - loss: 0.3142 - recall: 0.5146 - val_loss: 0.3032 - val_recall: 0.4839 - 355ms/epoch - 2ms/step
Epoch 68/150
225/225 - 0s - loss: 0.3137 - recall: 0.5098 - val_loss: 0.3038 - val_recall: 0.4839 - 344ms/epoch - 2ms/step
Epoch 69/150
225/225 - 0s - loss: 0.3132 - recall: 0.5105 - val_loss: 0.3036 - val_recall: 0.4968 - 369ms/epoch - 2ms/step
Epoch 70/150
225/225 - 0s - loss: 0.3128 - recall: 0.5105 - val_loss: 0.3033 - val_recall: 0.4839 - 386ms/epoch - 2ms/step
Epoch 71/150
225/225 - 0s - loss: 0.3124 - recall: 0.5105 - val_loss: 0.3033 - val_recall: 0.4968 - 330ms/epoch - 1ms/step
Epoch 72/150
225/225 - 0s - loss: 0.3121 - recall: 0.5152 - val_loss: 0.3029 - val_recall: 0.4903 - 380ms/epoch - 2ms/step
Epoch 73/150
225/225 - 0s - loss: 0.3117 - recall: 0.5146 - val_loss: 0.3030 - val_recall: 0.4839 - 353ms/epoch - 2ms/step
Epoch 74/150
225/225 - 0s - loss: 0.3107 - recall: 0.5125 - val_loss: 0.3015 - val_recall: 0.4968 - 370ms/epoch - 2ms/step
Epoch 75/150
225/225 - 0s - loss: 0.3105 - recall: 0.5179 - val_loss: 0.3030 - val_recall: 0.4968 - 416ms/epoch - 2ms/step
Epoch 76/150
225/225 - 0s - loss: 0.3103 - recall: 0.5295 - val_loss: 0.3032 - val_recall: 0.4774 - 404ms/epoch - 2ms/step
Epoch 77/150
225/225 - 0s - loss: 0.3101 - recall: 0.5166 - val_loss: 0.3057 - val_recall: 0.4839 - 340ms/epoch - 2ms/step
Epoch 78/150
225/225 - 0s - loss: 0.3095 - recall: 0.5166 - val_loss: 0.3042 - val_recall: 0.4968 - 362ms/epoch - 2ms/step
Epoch 79/150
225/225 - 0s - loss: 0.3087 - recall: 0.5173 - val_loss: 0.3047 - val_recall: 0.4968 - 341ms/epoch - 2ms/step
Epoch 80/150
225/225 - 0s - loss: 0.3087 - recall: 0.5159 - val_loss: 0.3038 - val_recall: 0.4968 - 358ms/epoch - 2ms/step
Epoch 81/150
225/225 - 0s - loss: 0.3081 - recall: 0.5240 - val_loss: 0.3028 - val_recall: 0.4903 - 355ms/epoch - 2ms/step
Epoch 82/150
225/225 - 0s - loss: 0.3078 - recall: 0.5213 - val_loss: 0.3028 - val_recall: 0.4839 - 373ms/epoch - 2ms/step
Epoch 83/150
225/225 - 0s - loss: 0.3076 - recall: 0.5206 - val_loss: 0.3036 - val_recall: 0.5032 - 373ms/epoch - 2ms/step
Epoch 84/150
225/225 - 0s - loss: 0.3074 - recall: 0.5234 - val_loss: 0.3057 - val_recall: 0.4968 - 378ms/epoch - 2ms/step
Epoch 85/150
225/225 - 0s - loss: 0.3066 - recall: 0.5206 - val_loss: 0.3025 - val_recall: 0.4903 - 367ms/epoch - 2ms/step
Epoch 86/150
225/225 - 0s - loss: 0.3063 - recall: 0.5227 - val_loss: 0.3019 - val_recall: 0.4839 - 361ms/epoch - 2ms/step
Epoch 87/150
225/225 - 0s - loss: 0.3062 - recall: 0.5227 - val_loss: 0.3044 - val_recall: 0.4903 - 346ms/epoch - 2ms/step
Epoch 88/150
225/225 - 0s - loss: 0.3055 - recall: 0.5267 - val_loss: 0.3054 - val_recall: 0.5161 - 376ms/epoch - 2ms/step
Epoch 89/150
225/225 - 0s - loss: 0.3050 - recall: 0.5254 - val_loss: 0.3038 - val_recall: 0.4903 - 382ms/epoch - 2ms/step
Epoch 90/150
225/225 - 0s - loss: 0.3043 - recall: 0.5274 - val_loss: 0.3043 - val_recall: 0.4903 - 350ms/epoch - 2ms/step
Epoch 91/150
225/225 - 0s - loss: 0.3042 - recall: 0.5267 - val_loss: 0.3055 - val_recall: 0.4839 - 341ms/epoch - 2ms/step
Epoch 92/150
225/225 - 0s - loss: 0.3040 - recall: 0.5281 - val_loss: 0.3042 - val_recall: 0.4968 - 375ms/epoch - 2ms/step
Epoch 93/150
225/225 - 0s - loss: 0.3035 - recall: 0.5308 - val_loss: 0.3078 - val_recall: 0.5032 - 366ms/epoch - 2ms/step
Epoch 94/150
225/225 - 0s - loss: 0.3029 - recall: 0.5274 - val_loss: 0.3062 - val_recall: 0.4839 - 333ms/epoch - 1ms/step
Epoch 95/150
225/225 - 0s - loss: 0.3027 - recall: 0.5254 - val_loss: 0.3032 - val_recall: 0.5032 - 336ms/epoch - 1ms/step
Epoch 96/150
225/225 - 0s - loss: 0.3018 - recall: 0.5281 - val_loss: 0.3051 - val_recall: 0.4839 - 378ms/epoch - 2ms/step
Epoch 97/150
225/225 - 0s - loss: 0.3018 - recall: 0.5308 - val_loss: 0.3002 - val_recall: 0.4839 - 381ms/epoch - 2ms/step
Epoch 98/150
225/225 - 0s - loss: 0.3015 - recall: 0.5376 - val_loss: 0.3043 - val_recall: 0.5032 - 344ms/epoch - 2ms/step
Epoch 99/150
225/225 - 0s - loss: 0.3010 - recall: 0.5301 - val_loss: 0.3078 - val_recall: 0.5032 - 345ms/epoch - 2ms/step
Epoch 100/150
225/225 - 0s - loss: 0.3007 - recall: 0.5362 - val_loss: 0.3043 - val_recall: 0.4968 - 353ms/epoch - 2ms/step
Epoch 101/150
225/225 - 0s - loss: 0.3005 - recall: 0.5416 - val_loss: 0.3035 - val_recall: 0.4774 - 382ms/epoch - 2ms/step
Epoch 102/150
225/225 - 0s - loss: 0.3002 - recall: 0.5369 - val_loss: 0.3042 - val_recall: 0.4774 - 385ms/epoch - 2ms/step
Epoch 103/150
225/225 - 0s - loss: 0.2998 - recall: 0.5342 - val_loss: 0.3035 - val_recall: 0.4968 - 360ms/epoch - 2ms/step
Epoch 104/150
225/225 - 0s - loss: 0.2992 - recall: 0.5328 - val_loss: 0.3026 - val_recall: 0.4839 - 374ms/epoch - 2ms/step
Epoch 105/150
225/225 - 0s - loss: 0.2990 - recall: 0.5369 - val_loss: 0.3040 - val_recall: 0.4903 - 352ms/epoch - 2ms/step
Epoch 106/150
225/225 - 0s - loss: 0.2986 - recall: 0.5328 - val_loss: 0.3058 - val_recall: 0.4968 - 345ms/epoch - 2ms/step
Epoch 107/150
225/225 - 0s - loss: 0.2983 - recall: 0.5355 - val_loss: 0.3089 - val_recall: 0.4903 - 348ms/epoch - 2ms/step
Epoch 108/150
225/225 - 0s - loss: 0.2978 - recall: 0.5288 - val_loss: 0.3078 - val_recall: 0.5419 - 351ms/epoch - 2ms/step
Epoch 109/150
225/225 - 0s - loss: 0.2979 - recall: 0.5383 - val_loss: 0.3080 - val_recall: 0.5355 - 378ms/epoch - 2ms/step
Epoch 110/150
225/225 - 0s - loss: 0.2972 - recall: 0.5410 - val_loss: 0.3045 - val_recall: 0.4839 - 367ms/epoch - 2ms/step
Epoch 111/150
225/225 - 0s - loss: 0.2970 - recall: 0.5308 - val_loss: 0.3072 - val_recall: 0.4968 - 407ms/epoch - 2ms/step
Epoch 112/150
225/225 - 0s - loss: 0.2964 - recall: 0.5410 - val_loss: 0.3081 - val_recall: 0.5161 - 340ms/epoch - 2ms/step
Epoch 113/150
225/225 - 0s - loss: 0.2962 - recall: 0.5430 - val_loss: 0.3073 - val_recall: 0.4710 - 381ms/epoch - 2ms/step
Epoch 114/150
225/225 - 0s - loss: 0.2951 - recall: 0.5349 - val_loss: 0.3064 - val_recall: 0.4968 - 380ms/epoch - 2ms/step
Epoch 115/150
225/225 - 0s - loss: 0.2952 - recall: 0.5437 - val_loss: 0.3053 - val_recall: 0.5161 - 372ms/epoch - 2ms/step
Epoch 116/150
225/225 - 0s - loss: 0.2944 - recall: 0.5416 - val_loss: 0.3119 - val_recall: 0.5548 - 385ms/epoch - 2ms/step
Epoch 117/150
225/225 - 0s - loss: 0.2946 - recall: 0.5464 - val_loss: 0.3063 - val_recall: 0.5032 - 345ms/epoch - 2ms/step
Epoch 118/150
225/225 - 0s - loss: 0.2946 - recall: 0.5450 - val_loss: 0.3058 - val_recall: 0.5032 - 347ms/epoch - 2ms/step
Epoch 119/150
225/225 - 0s - loss: 0.2937 - recall: 0.5355 - val_loss: 0.3079 - val_recall: 0.5161 - 375ms/epoch - 2ms/step
Epoch 120/150
225/225 - 0s - loss: 0.2934 - recall: 0.5416 - val_loss: 0.3117 - val_recall: 0.5161 - 348ms/epoch - 2ms/step
Epoch 121/150
225/225 - 0s - loss: 0.2931 - recall: 0.5511 - val_loss: 0.3074 - val_recall: 0.5097 - 372ms/epoch - 2ms/step
Epoch 122/150
225/225 - 0s - loss: 0.2927 - recall: 0.5484 - val_loss: 0.3108 - val_recall: 0.4903 - 372ms/epoch - 2ms/step
Epoch 123/150
225/225 - 0s - loss: 0.2921 - recall: 0.5450 - val_loss: 0.3082 - val_recall: 0.5097 - 335ms/epoch - 1ms/step
Epoch 124/150
225/225 - 0s - loss: 0.2922 - recall: 0.5450 - val_loss: 0.3062 - val_recall: 0.5290 - 374ms/epoch - 2ms/step
Epoch 125/150
225/225 - 0s - loss: 0.2916 - recall: 0.5491 - val_loss: 0.3077 - val_recall: 0.4903 - 345ms/epoch - 2ms/step
Epoch 126/150
225/225 - 0s - loss: 0.2915 - recall: 0.5477 - val_loss: 0.3089 - val_recall: 0.5097 - 383ms/epoch - 2ms/step
Epoch 127/150
225/225 - 0s - loss: 0.2907 - recall: 0.5498 - val_loss: 0.3039 - val_recall: 0.5226 - 344ms/epoch - 2ms/step
Epoch 128/150
225/225 - 0s - loss: 0.2902 - recall: 0.5484 - val_loss: 0.3089 - val_recall: 0.5355 - 378ms/epoch - 2ms/step
Epoch 129/150
225/225 - 0s - loss: 0.2901 - recall: 0.5545 - val_loss: 0.3095 - val_recall: 0.4968 - 341ms/epoch - 2ms/step
Epoch 130/150
225/225 - 0s - loss: 0.2897 - recall: 0.5471 - val_loss: 0.3065 - val_recall: 0.4968 - 351ms/epoch - 2ms/step
Epoch 131/150
225/225 - 0s - loss: 0.2894 - recall: 0.5477 - val_loss: 0.3094 - val_recall: 0.5290 - 341ms/epoch - 2ms/step
Epoch 132/150
225/225 - 0s - loss: 0.2891 - recall: 0.5457 - val_loss: 0.3081 - val_recall: 0.4903 - 359ms/epoch - 2ms/step
Epoch 133/150
225/225 - 0s - loss: 0.2887 - recall: 0.5504 - val_loss: 0.3105 - val_recall: 0.5097 - 375ms/epoch - 2ms/step
Epoch 134/150
225/225 - 0s - loss: 0.2878 - recall: 0.5572 - val_loss: 0.3073 - val_recall: 0.5226 - 341ms/epoch - 2ms/step
Epoch 135/150
225/225 - 0s - loss: 0.2878 - recall: 0.5484 - val_loss: 0.3096 - val_recall: 0.5290 - 377ms/epoch - 2ms/step
Epoch 136/150
225/225 - 0s - loss: 0.2880 - recall: 0.5423 - val_loss: 0.3098 - val_recall: 0.5226 - 339ms/epoch - 2ms/step
Epoch 137/150
225/225 - 0s - loss: 0.2870 - recall: 0.5592 - val_loss: 0.3075 - val_recall: 0.5290 - 333ms/epoch - 1ms/step
Epoch 138/150
225/225 - 0s - loss: 0.2864 - recall: 0.5559 - val_loss: 0.3110 - val_recall: 0.5161 - 352ms/epoch - 2ms/step
Epoch 139/150
225/225 - 0s - loss: 0.2865 - recall: 0.5491 - val_loss: 0.3099 - val_recall: 0.4903 - 360ms/epoch - 2ms/step
Epoch 140/150
225/225 - 0s - loss: 0.2855 - recall: 0.5606 - val_loss: 0.3129 - val_recall: 0.5097 - 388ms/epoch - 2ms/step
Epoch 141/150
225/225 - 0s - loss: 0.2854 - recall: 0.5559 - val_loss: 0.3128 - val_recall: 0.4903 - 354ms/epoch - 2ms/step
Epoch 142/150
225/225 - 0s - loss: 0.2854 - recall: 0.5592 - val_loss: 0.3075 - val_recall: 0.5097 - 380ms/epoch - 2ms/step
Epoch 143/150
225/225 - 0s - loss: 0.2845 - recall: 0.5552 - val_loss: 0.3084 - val_recall: 0.5097 - 344ms/epoch - 2ms/step
Epoch 144/150
225/225 - 0s - loss: 0.2842 - recall: 0.5592 - val_loss: 0.3105 - val_recall: 0.5161 - 378ms/epoch - 2ms/step
Epoch 145/150
225/225 - 0s - loss: 0.2839 - recall: 0.5680 - val_loss: 0.3084 - val_recall: 0.5097 - 377ms/epoch - 2ms/step
Epoch 146/150
225/225 - 0s - loss: 0.2831 - recall: 0.5633 - val_loss: 0.3098 - val_recall: 0.5290 - 345ms/epoch - 2ms/step
Epoch 147/150
225/225 - 0s - loss: 0.2834 - recall: 0.5613 - val_loss: 0.3096 - val_recall: 0.4903 - 390ms/epoch - 2ms/step
Epoch 148/150
225/225 - 0s - loss: 0.2826 - recall: 0.5626 - val_loss: 0.3065 - val_recall: 0.4968 - 374ms/epoch - 2ms/step
Epoch 149/150
225/225 - 0s - loss: 0.2823 - recall: 0.5680 - val_loss: 0.3086 - val_recall: 0.5161 - 383ms/epoch - 2ms/step
Epoch 150/150
225/225 - 0s - loss: 0.2820 - recall: 0.5579 - val_loss: 0.3177 - val_recall: 0.5548 - 360ms/epoch - 2ms/step

Plotting Accuracy vs Epoch Curve¶

In [35]:
plt.figure(figsize = (15, 8))
plt.plot(history_1.history['loss'])
plt.plot(history_1.history['val_loss'])
plt.title('Loss vs Epochs')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc = 'lower right')
plt.show()

Observations:

  • The training loss is smooth and overall it is decreasing with the increase in the epochs.
  • The validation loss is decreasing and with the increase in the epochs.
  • The validation loss is being faithful to the training loss. In fact, it is lower than the training loss. So there is no evidence of overfitting. The model is giving a generalized performance.
  • Epoch count of 50 is ideal to stop at.

Let's try to increase the model complexity by tuning some of the hyper-parameters mentioned earlier and check if we can improve the model performance. Out of all the options we have, let's try to change the number of hidden layers, the number of neurons in each hidden layer, and the optimizer from SGD to adam. Also, we have observed that validation loss became constant after some epochs, let's try less number of epochs which would also reduce the computation time.

First, we need to clear the previous model's history from the session. In Keras, we need special command to clear the model's history otherwise the previous model history remains in the backend. Also, let's fix the seed again after clearing the backend.

In [36]:
backend.clear_session()
In [37]:
# Fixing the seed for random number generators
np.random.seed(5)
random.seed(5)
tf.random.set_seed(5)
In [38]:
# We will be adding the layers sequentially
model_2 = Sequential()
# First hidden layer with 128 neurons and relu activation function, the input_shape tuple denotes number of independent variables
model_2.add(Dense(128, activation = 'relu', input_shape = (11, )))

# Second hidden layer with 64 neurons and relu activation function
model_2.add(Dense(64, activation = 'relu'))

# Second hidden layer with 32 neurons and relu activation function
model_2.add(Dense(32, activation = 'relu'))

# Output layer with only one neuron and sigmoid as activation function which will give the probability of customer churning  
model_2.add(Dense(1, activation = 'sigmoid'))
In [39]:
# Compliling the model with binary crossentropy as loss, Adam as optimizer and accuracy as metrics
model_2.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics=[tf.keras.metrics.Recall()])
# Printing the summary of the model
model_2.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 128)               1536      
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 dense_2 (Dense)             (None, 32)                2080      
                                                                 
 dense_3 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 11,905
Trainable params: 11,905
Non-trainable params: 0
_________________________________________________________________
In [40]:
# Fitting the model on train data
history_2 = model_2.fit(X_train,y_train,validation_split = 0.1,epochs = 60,verbose = 2)
Epoch 1/60
225/225 - 1s - loss: 0.4152 - recall: 0.2877 - val_loss: 0.3384 - val_recall: 0.5677 - 1s/epoch - 6ms/step
Epoch 2/60
225/225 - 0s - loss: 0.3549 - recall: 0.4685 - val_loss: 0.3193 - val_recall: 0.4387 - 451ms/epoch - 2ms/step
Epoch 3/60
225/225 - 0s - loss: 0.3443 - recall: 0.4726 - val_loss: 0.3186 - val_recall: 0.5677 - 459ms/epoch - 2ms/step
Epoch 4/60
225/225 - 0s - loss: 0.3368 - recall: 0.4997 - val_loss: 0.3183 - val_recall: 0.5097 - 478ms/epoch - 2ms/step
Epoch 5/60
225/225 - 0s - loss: 0.3293 - recall: 0.4942 - val_loss: 0.3187 - val_recall: 0.5548 - 415ms/epoch - 2ms/step
Epoch 6/60
225/225 - 0s - loss: 0.3255 - recall: 0.5058 - val_loss: 0.3232 - val_recall: 0.5032 - 427ms/epoch - 2ms/step
Epoch 7/60
225/225 - 0s - loss: 0.3217 - recall: 0.5118 - val_loss: 0.3148 - val_recall: 0.4839 - 426ms/epoch - 2ms/step
Epoch 8/60
225/225 - 0s - loss: 0.3175 - recall: 0.5173 - val_loss: 0.3115 - val_recall: 0.4710 - 434ms/epoch - 2ms/step
Epoch 9/60
225/225 - 0s - loss: 0.3133 - recall: 0.5261 - val_loss: 0.3154 - val_recall: 0.4839 - 445ms/epoch - 2ms/step
Epoch 10/60
225/225 - 0s - loss: 0.3094 - recall: 0.5410 - val_loss: 0.3226 - val_recall: 0.4258 - 477ms/epoch - 2ms/step
Epoch 11/60
225/225 - 0s - loss: 0.3043 - recall: 0.5328 - val_loss: 0.3264 - val_recall: 0.5355 - 433ms/epoch - 2ms/step
Epoch 12/60
225/225 - 0s - loss: 0.3031 - recall: 0.5416 - val_loss: 0.3246 - val_recall: 0.4968 - 443ms/epoch - 2ms/step
Epoch 13/60
225/225 - 0s - loss: 0.2962 - recall: 0.5572 - val_loss: 0.3357 - val_recall: 0.5290 - 428ms/epoch - 2ms/step
Epoch 14/60
225/225 - 0s - loss: 0.2926 - recall: 0.5443 - val_loss: 0.3336 - val_recall: 0.5548 - 455ms/epoch - 2ms/step
Epoch 15/60
225/225 - 0s - loss: 0.2892 - recall: 0.5667 - val_loss: 0.3347 - val_recall: 0.4581 - 442ms/epoch - 2ms/step
Epoch 16/60
225/225 - 0s - loss: 0.2857 - recall: 0.5579 - val_loss: 0.3289 - val_recall: 0.5032 - 424ms/epoch - 2ms/step
Epoch 17/60
225/225 - 0s - loss: 0.2817 - recall: 0.5755 - val_loss: 0.3405 - val_recall: 0.4710 - 424ms/epoch - 2ms/step
Epoch 18/60
225/225 - 0s - loss: 0.2763 - recall: 0.5768 - val_loss: 0.3367 - val_recall: 0.4516 - 442ms/epoch - 2ms/step
Epoch 19/60
225/225 - 0s - loss: 0.2737 - recall: 0.5863 - val_loss: 0.3434 - val_recall: 0.5097 - 444ms/epoch - 2ms/step
Epoch 20/60
225/225 - 0s - loss: 0.2707 - recall: 0.5985 - val_loss: 0.3476 - val_recall: 0.4194 - 466ms/epoch - 2ms/step
Epoch 21/60
225/225 - 0s - loss: 0.2652 - recall: 0.6039 - val_loss: 0.3645 - val_recall: 0.4645 - 431ms/epoch - 2ms/step
Epoch 22/60
225/225 - 0s - loss: 0.2597 - recall: 0.6141 - val_loss: 0.3599 - val_recall: 0.4516 - 464ms/epoch - 2ms/step
Epoch 23/60
225/225 - 0s - loss: 0.2557 - recall: 0.6181 - val_loss: 0.3629 - val_recall: 0.5161 - 426ms/epoch - 2ms/step
Epoch 24/60
225/225 - 0s - loss: 0.2531 - recall: 0.6263 - val_loss: 0.3531 - val_recall: 0.4968 - 458ms/epoch - 2ms/step
Epoch 25/60
225/225 - 0s - loss: 0.2479 - recall: 0.6269 - val_loss: 0.3785 - val_recall: 0.5097 - 472ms/epoch - 2ms/step
Epoch 26/60
225/225 - 0s - loss: 0.2472 - recall: 0.6412 - val_loss: 0.3917 - val_recall: 0.4258 - 458ms/epoch - 2ms/step
Epoch 27/60
225/225 - 0s - loss: 0.2415 - recall: 0.6412 - val_loss: 0.3854 - val_recall: 0.4387 - 411ms/epoch - 2ms/step
Epoch 28/60
225/225 - 0s - loss: 0.2378 - recall: 0.6534 - val_loss: 0.3907 - val_recall: 0.5613 - 415ms/epoch - 2ms/step
Epoch 29/60
225/225 - 0s - loss: 0.2339 - recall: 0.6628 - val_loss: 0.3841 - val_recall: 0.4968 - 430ms/epoch - 2ms/step
Epoch 30/60
225/225 - 0s - loss: 0.2272 - recall: 0.6811 - val_loss: 0.3979 - val_recall: 0.5677 - 458ms/epoch - 2ms/step
Epoch 31/60
225/225 - 0s - loss: 0.2270 - recall: 0.6798 - val_loss: 0.4347 - val_recall: 0.4387 - 473ms/epoch - 2ms/step
Epoch 32/60
225/225 - 0s - loss: 0.2232 - recall: 0.6798 - val_loss: 0.4169 - val_recall: 0.4968 - 457ms/epoch - 2ms/step
Epoch 33/60
225/225 - 0s - loss: 0.2200 - recall: 0.6804 - val_loss: 0.4191 - val_recall: 0.4645 - 477ms/epoch - 2ms/step
Epoch 34/60
225/225 - 0s - loss: 0.2138 - recall: 0.6892 - val_loss: 0.4264 - val_recall: 0.4387 - 435ms/epoch - 2ms/step
Epoch 35/60
225/225 - 0s - loss: 0.2082 - recall: 0.7028 - val_loss: 0.4148 - val_recall: 0.5419 - 443ms/epoch - 2ms/step
Epoch 36/60
225/225 - 0s - loss: 0.2055 - recall: 0.7028 - val_loss: 0.4288 - val_recall: 0.4903 - 437ms/epoch - 2ms/step
Epoch 37/60
225/225 - 0s - loss: 0.2045 - recall: 0.7028 - val_loss: 0.4356 - val_recall: 0.5419 - 456ms/epoch - 2ms/step
Epoch 38/60
225/225 - 0s - loss: 0.2032 - recall: 0.7116 - val_loss: 0.4506 - val_recall: 0.5355 - 490ms/epoch - 2ms/step
Epoch 39/60
225/225 - 0s - loss: 0.1988 - recall: 0.7177 - val_loss: 0.4615 - val_recall: 0.6258 - 449ms/epoch - 2ms/step
Epoch 40/60
225/225 - 0s - loss: 0.1940 - recall: 0.7271 - val_loss: 0.4628 - val_recall: 0.5613 - 442ms/epoch - 2ms/step
Epoch 41/60
225/225 - 0s - loss: 0.1883 - recall: 0.7326 - val_loss: 0.4581 - val_recall: 0.5290 - 438ms/epoch - 2ms/step
Epoch 42/60
225/225 - 0s - loss: 0.1860 - recall: 0.7285 - val_loss: 0.4757 - val_recall: 0.5419 - 420ms/epoch - 2ms/step
Epoch 43/60
225/225 - 0s - loss: 0.1830 - recall: 0.7292 - val_loss: 0.5070 - val_recall: 0.4968 - 424ms/epoch - 2ms/step
Epoch 44/60
225/225 - 0s - loss: 0.1808 - recall: 0.7400 - val_loss: 0.5081 - val_recall: 0.5419 - 421ms/epoch - 2ms/step
Epoch 45/60
225/225 - 0s - loss: 0.1767 - recall: 0.7502 - val_loss: 0.5041 - val_recall: 0.5548 - 471ms/epoch - 2ms/step
Epoch 46/60
225/225 - 0s - loss: 0.1685 - recall: 0.7603 - val_loss: 0.5014 - val_recall: 0.4774 - 451ms/epoch - 2ms/step
Epoch 47/60
225/225 - 0s - loss: 0.1690 - recall: 0.7624 - val_loss: 0.5259 - val_recall: 0.5613 - 472ms/epoch - 2ms/step
Epoch 48/60
225/225 - 0s - loss: 0.1618 - recall: 0.7739 - val_loss: 0.5456 - val_recall: 0.5161 - 454ms/epoch - 2ms/step
Epoch 49/60
225/225 - 0s - loss: 0.1652 - recall: 0.7759 - val_loss: 0.5407 - val_recall: 0.5484 - 467ms/epoch - 2ms/step
Epoch 50/60
225/225 - 0s - loss: 0.1600 - recall: 0.7759 - val_loss: 0.5765 - val_recall: 0.4645 - 415ms/epoch - 2ms/step
Epoch 51/60
225/225 - 0s - loss: 0.1602 - recall: 0.7759 - val_loss: 0.5397 - val_recall: 0.5355 - 433ms/epoch - 2ms/step
Epoch 52/60
225/225 - 0s - loss: 0.1525 - recall: 0.7915 - val_loss: 0.5639 - val_recall: 0.6000 - 454ms/epoch - 2ms/step
Epoch 53/60
225/225 - 0s - loss: 0.1532 - recall: 0.7854 - val_loss: 0.5801 - val_recall: 0.5097 - 460ms/epoch - 2ms/step
Epoch 54/60
225/225 - 0s - loss: 0.1474 - recall: 0.8003 - val_loss: 0.5996 - val_recall: 0.5419 - 458ms/epoch - 2ms/step
Epoch 55/60
225/225 - 0s - loss: 0.1430 - recall: 0.7982 - val_loss: 0.5768 - val_recall: 0.5097 - 478ms/epoch - 2ms/step
Epoch 56/60
225/225 - 0s - loss: 0.1435 - recall: 0.8037 - val_loss: 0.6127 - val_recall: 0.5032 - 428ms/epoch - 2ms/step
Epoch 57/60
225/225 - 0s - loss: 0.1407 - recall: 0.8111 - val_loss: 0.5812 - val_recall: 0.5290 - 440ms/epoch - 2ms/step
Epoch 58/60
225/225 - 0s - loss: 0.1387 - recall: 0.8165 - val_loss: 0.6014 - val_recall: 0.5355 - 475ms/epoch - 2ms/step
Epoch 59/60
225/225 - 0s - loss: 0.1308 - recall: 0.8226 - val_loss: 0.6440 - val_recall: 0.5677 - 447ms/epoch - 2ms/step
Epoch 60/60
225/225 - 0s - loss: 0.1319 - recall: 0.8267 - val_loss: 0.6193 - val_recall: 0.5097 - 425ms/epoch - 2ms/step
In [41]:
plt.figure(figsize = (15, 8))
plt.plot(history_2.history['loss'])
plt.plot(history_2.history['val_loss'])
plt.title('loss vs Epochs')
plt.ylabel('loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc = 'lower right')
plt.show()

Observations:

  • From the above plot we can say that the model is overfitting.

Let's try to further tune some of the hyper-parameters and check if we can overcome the overfitting issue.

We will use learning_rate = 0.001 for the optimizer in the training process

In [42]:
backend.clear_session()
# Fixing the seed for random number generators
np.random.seed(5)
random.seed(5)
tf.random.set_seed(5)
In [43]:
# We will be adding the layers sequentially
model_3 = Sequential()
# First hidden layer with 128 neurons and relu activation function, the input_shape tuple denotes number of independent variables
model_3.add(Dense(128, activation = 'relu', input_shape = (11, )))

# Second hidden layer with 64 neurons and relu activation function
model_3.add(Dense(64, activation = 'relu'))

# Second hidden layer with 32 neurons and relu activation function
model_3.add(Dense(32, activation = 'relu'))

# Output layer with only one neuron and sigmoid as activation function which will give the probability of students getting admitted into UCLA  
model_3.add(Dense(1, activation = 'sigmoid'))
In [44]:
# Compliling the model with binary crossentropy as loss, Adam as optimizer with 0.001 as learning rate and accuracy as metrics
model_3.compile(loss = 'binary_crossentropy', optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001), metrics=[tf.keras.metrics.Recall()])
# Printing the summary of the model
model_3.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 128)               1536      
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 dense_2 (Dense)             (None, 32)                2080      
                                                                 
 dense_3 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 11,905
Trainable params: 11,905
Non-trainable params: 0
_________________________________________________________________
  • Notice that the number of trainable parameters has increased substantially in comparison to previous models.
In [45]:
# Fitting the model on the train data
history_3 = model_3.fit(X_train,y_train,validation_split = 0.1,epochs = 50,verbose = 2)
Epoch 1/50
225/225 - 1s - loss: 0.4152 - recall: 0.2877 - val_loss: 0.3384 - val_recall: 0.5677 - 1s/epoch - 6ms/step
Epoch 2/50
225/225 - 0s - loss: 0.3549 - recall: 0.4685 - val_loss: 0.3193 - val_recall: 0.4387 - 444ms/epoch - 2ms/step
Epoch 3/50
225/225 - 0s - loss: 0.3443 - recall: 0.4726 - val_loss: 0.3186 - val_recall: 0.5677 - 454ms/epoch - 2ms/step
Epoch 4/50
225/225 - 0s - loss: 0.3368 - recall: 0.4997 - val_loss: 0.3183 - val_recall: 0.5097 - 428ms/epoch - 2ms/step
Epoch 5/50
225/225 - 0s - loss: 0.3293 - recall: 0.4942 - val_loss: 0.3187 - val_recall: 0.5548 - 461ms/epoch - 2ms/step
Epoch 6/50
225/225 - 0s - loss: 0.3255 - recall: 0.5058 - val_loss: 0.3232 - val_recall: 0.5032 - 468ms/epoch - 2ms/step
Epoch 7/50
225/225 - 0s - loss: 0.3217 - recall: 0.5118 - val_loss: 0.3148 - val_recall: 0.4839 - 472ms/epoch - 2ms/step
Epoch 8/50
225/225 - 0s - loss: 0.3175 - recall: 0.5173 - val_loss: 0.3115 - val_recall: 0.4710 - 455ms/epoch - 2ms/step
Epoch 9/50
225/225 - 0s - loss: 0.3133 - recall: 0.5261 - val_loss: 0.3154 - val_recall: 0.4839 - 469ms/epoch - 2ms/step
Epoch 10/50
225/225 - 0s - loss: 0.3094 - recall: 0.5410 - val_loss: 0.3226 - val_recall: 0.4258 - 450ms/epoch - 2ms/step
Epoch 11/50
225/225 - 0s - loss: 0.3043 - recall: 0.5328 - val_loss: 0.3264 - val_recall: 0.5355 - 474ms/epoch - 2ms/step
Epoch 12/50
225/225 - 0s - loss: 0.3031 - recall: 0.5416 - val_loss: 0.3246 - val_recall: 0.4968 - 476ms/epoch - 2ms/step
Epoch 13/50
225/225 - 0s - loss: 0.2962 - recall: 0.5572 - val_loss: 0.3357 - val_recall: 0.5290 - 421ms/epoch - 2ms/step
Epoch 14/50
225/225 - 0s - loss: 0.2926 - recall: 0.5443 - val_loss: 0.3336 - val_recall: 0.5548 - 469ms/epoch - 2ms/step
Epoch 15/50
225/225 - 0s - loss: 0.2892 - recall: 0.5667 - val_loss: 0.3347 - val_recall: 0.4581 - 426ms/epoch - 2ms/step
Epoch 16/50
225/225 - 0s - loss: 0.2857 - recall: 0.5579 - val_loss: 0.3289 - val_recall: 0.5032 - 435ms/epoch - 2ms/step
Epoch 17/50
225/225 - 0s - loss: 0.2817 - recall: 0.5755 - val_loss: 0.3405 - val_recall: 0.4710 - 457ms/epoch - 2ms/step
Epoch 18/50
225/225 - 0s - loss: 0.2763 - recall: 0.5768 - val_loss: 0.3367 - val_recall: 0.4516 - 474ms/epoch - 2ms/step
Epoch 19/50
225/225 - 0s - loss: 0.2737 - recall: 0.5863 - val_loss: 0.3434 - val_recall: 0.5097 - 436ms/epoch - 2ms/step
Epoch 20/50
225/225 - 0s - loss: 0.2707 - recall: 0.5985 - val_loss: 0.3476 - val_recall: 0.4194 - 427ms/epoch - 2ms/step
Epoch 21/50
225/225 - 0s - loss: 0.2652 - recall: 0.6039 - val_loss: 0.3645 - val_recall: 0.4645 - 428ms/epoch - 2ms/step
Epoch 22/50
225/225 - 0s - loss: 0.2597 - recall: 0.6141 - val_loss: 0.3599 - val_recall: 0.4516 - 422ms/epoch - 2ms/step
Epoch 23/50
225/225 - 0s - loss: 0.2557 - recall: 0.6181 - val_loss: 0.3629 - val_recall: 0.5161 - 446ms/epoch - 2ms/step
Epoch 24/50
225/225 - 0s - loss: 0.2531 - recall: 0.6263 - val_loss: 0.3531 - val_recall: 0.4968 - 465ms/epoch - 2ms/step
Epoch 25/50
225/225 - 0s - loss: 0.2479 - recall: 0.6269 - val_loss: 0.3785 - val_recall: 0.5097 - 471ms/epoch - 2ms/step
Epoch 26/50
225/225 - 0s - loss: 0.2472 - recall: 0.6412 - val_loss: 0.3917 - val_recall: 0.4258 - 439ms/epoch - 2ms/step
Epoch 27/50
225/225 - 0s - loss: 0.2415 - recall: 0.6412 - val_loss: 0.3854 - val_recall: 0.4387 - 456ms/epoch - 2ms/step
Epoch 28/50
225/225 - 0s - loss: 0.2378 - recall: 0.6534 - val_loss: 0.3907 - val_recall: 0.5613 - 454ms/epoch - 2ms/step
Epoch 29/50
225/225 - 0s - loss: 0.2339 - recall: 0.6628 - val_loss: 0.3841 - val_recall: 0.4968 - 431ms/epoch - 2ms/step
Epoch 30/50
225/225 - 0s - loss: 0.2272 - recall: 0.6811 - val_loss: 0.3979 - val_recall: 0.5677 - 457ms/epoch - 2ms/step
Epoch 31/50
225/225 - 0s - loss: 0.2270 - recall: 0.6798 - val_loss: 0.4347 - val_recall: 0.4387 - 418ms/epoch - 2ms/step
Epoch 32/50
225/225 - 0s - loss: 0.2232 - recall: 0.6798 - val_loss: 0.4169 - val_recall: 0.4968 - 459ms/epoch - 2ms/step
Epoch 33/50
225/225 - 0s - loss: 0.2200 - recall: 0.6804 - val_loss: 0.4191 - val_recall: 0.4645 - 437ms/epoch - 2ms/step
Epoch 34/50
225/225 - 0s - loss: 0.2138 - recall: 0.6892 - val_loss: 0.4264 - val_recall: 0.4387 - 458ms/epoch - 2ms/step
Epoch 35/50
225/225 - 0s - loss: 0.2082 - recall: 0.7028 - val_loss: 0.4148 - val_recall: 0.5419 - 458ms/epoch - 2ms/step
Epoch 36/50
225/225 - 0s - loss: 0.2055 - recall: 0.7028 - val_loss: 0.4288 - val_recall: 0.4903 - 451ms/epoch - 2ms/step
Epoch 37/50
225/225 - 0s - loss: 0.2045 - recall: 0.7028 - val_loss: 0.4356 - val_recall: 0.5419 - 452ms/epoch - 2ms/step
Epoch 38/50
225/225 - 0s - loss: 0.2032 - recall: 0.7116 - val_loss: 0.4506 - val_recall: 0.5355 - 423ms/epoch - 2ms/step
Epoch 39/50
225/225 - 0s - loss: 0.1988 - recall: 0.7177 - val_loss: 0.4615 - val_recall: 0.6258 - 463ms/epoch - 2ms/step
Epoch 40/50
225/225 - 0s - loss: 0.1940 - recall: 0.7271 - val_loss: 0.4628 - val_recall: 0.5613 - 477ms/epoch - 2ms/step
Epoch 41/50
225/225 - 0s - loss: 0.1883 - recall: 0.7326 - val_loss: 0.4581 - val_recall: 0.5290 - 419ms/epoch - 2ms/step
Epoch 42/50
225/225 - 0s - loss: 0.1860 - recall: 0.7285 - val_loss: 0.4757 - val_recall: 0.5419 - 437ms/epoch - 2ms/step
Epoch 43/50
225/225 - 0s - loss: 0.1830 - recall: 0.7292 - val_loss: 0.5070 - val_recall: 0.4968 - 465ms/epoch - 2ms/step
Epoch 44/50
225/225 - 0s - loss: 0.1808 - recall: 0.7400 - val_loss: 0.5081 - val_recall: 0.5419 - 474ms/epoch - 2ms/step
Epoch 45/50
225/225 - 0s - loss: 0.1767 - recall: 0.7502 - val_loss: 0.5041 - val_recall: 0.5548 - 467ms/epoch - 2ms/step
Epoch 46/50
225/225 - 0s - loss: 0.1685 - recall: 0.7603 - val_loss: 0.5014 - val_recall: 0.4774 - 435ms/epoch - 2ms/step
Epoch 47/50
225/225 - 0s - loss: 0.1690 - recall: 0.7624 - val_loss: 0.5259 - val_recall: 0.5613 - 455ms/epoch - 2ms/step
Epoch 48/50
225/225 - 0s - loss: 0.1618 - recall: 0.7739 - val_loss: 0.5456 - val_recall: 0.5161 - 444ms/epoch - 2ms/step
Epoch 49/50
225/225 - 0s - loss: 0.1652 - recall: 0.7759 - val_loss: 0.5407 - val_recall: 0.5484 - 421ms/epoch - 2ms/step
Epoch 50/50
225/225 - 0s - loss: 0.1600 - recall: 0.7759 - val_loss: 0.5765 - val_recall: 0.4645 - 461ms/epoch - 2ms/step
In [46]:
plt.figure(figsize = (15, 8))
plt.plot(history_3.history['loss'])
plt.plot(history_3.history['val_loss'])
plt.title('loss vs Epochs')
plt.ylabel('loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc = 'lower right')
plt.show()
  • We can observe that from increasing the model complexity, the model is getting worse, and overfitting problem.
  • Let's take model-1 as the best model for predicting the test data.
In [47]:
# Since we cleared sessions, we need to recreate the model_1.
# We will be adding the layers sequentially
model_1 = Sequential()

# First hidden layer with 128 neurons and relu activation function, the input_shape tuple denotes number of independent variables
model_1.add(Dense(128, activation = 'relu', input_shape = (11, )))

# Second hidden layer with 64 neurons and relu activation function
model_1.add(Dense(64, activation = 'relu'))

# Output layer with only one neuron and sigmoid as activation function which will give the probability of customer exiting
model_1.add(Dense(1, activation = 'sigmoid'))

# Compliling the model with binary crossentropy as loss, SGD as optimizer and accuracy as metrics
#model_1.compile(loss = 'binary_crossentropy', optimizer = 'SGD', metrics = ['accuracy'])
model_1.compile(loss = 'binary_crossentropy', optimizer = 'SGD', metrics=[tf.keras.metrics.Recall()])

history_1 = model_1.fit(X_train, y_train,validation_split = 0.1,epochs = 150,verbose = 0)

model_1.evaluate(X_test, y_test, verbose = 1)

test_pred = np.round(model_1.predict(X_test))
63/63 [==============================] - 0s 2ms/step - loss: 0.3679 - recall_1: 0.5012

The test accuracy is also coming out to be 84% which implies that our model is able to replicate the performance from the train and validation data on the test (unseen) data.

In [48]:
print(classification_report(y_test, test_pred))
cm = confusion_matrix(y_test, test_pred)
labels = np.asarray(
        [
            [
                "{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())
            ]  # flatten will reshape
            for item in cm.flatten()
        ]
    ).reshape(2, 2)


plt.figure(figsize = (8, 5))

sns.heatmap(cm, annot = labels,  fmt = '', xticklabels = ['Not Exited', 'Exited'], yticklabels = ['Not Exited', 'Exited'])

plt.ylabel('Actual')

plt.xlabel('Predicted')

plt.show()
              precision    recall  f1-score   support

           0       0.88      0.92      0.90      1595
           1       0.62      0.50      0.56       405

    accuracy                           0.84      2000
   macro avg       0.75      0.71      0.73      2000
weighted avg       0.83      0.84      0.83      2000

  • The confusion matrix shows that the model can identify half of the customers that will churn. A higher recall is desirable.
  • The classification report shows weighted average performance > 80%, which is ok but can be improved.
  • Let's plot the ROC curve for this.
In [49]:
# Let's plot the ROC curve (training dataset)

# roc_curve returns fpr : array, shape = [>2]
# Increasing false positive rates such that element i is the false
# positive rate of predictions with score >= thresholds[i].

# tpr : array, shape = [>2]
# Increasing true positive rates such that element i is the true
# positive rate of predictions with score >= thresholds[i].

# thresholds : array, shape = [n_thresholds]
# Decreasing thresholds on the decision function used to compute
# fpr and tpr.

logit_roc_auc_train = roc_auc_score(y_test, test_pred)
fpr, tpr, thresholds = roc_curve(y_test, test_pred)
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver Operating Characteristic Curve")
plt.legend(loc="lower right")
plt.show()
  • Let's try the RMSProp algorithm and check the results before concluding.
In [50]:
# We will be adding the layers sequentially
model_4 = Sequential()
# First hidden layer with 128 neurons and relu activation function, the input_shape tuple denotes number of independent variables
model_4.add(Dense(128, activation = 'relu', input_shape = (11, )))

# Second hidden layer with 64 neurons and relu activation function
model_4.add(Dense(64, activation = 'relu'))

# Second hidden layer with 32 neurons and relu activation function
model_4.add(Dense(32, activation = 'relu'))

# Output layer with only one neuron and sigmoid as activation function which will give the probability of customer churning  
model_4.add(Dense(1, activation = 'sigmoid'))
In [51]:
# Compliling the model with binary crossentropy as loss, Adam as optimizer with 0.001 as learning rate and accuracy as metrics
model_4.compile(loss = 'binary_crossentropy', optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.001), metrics=[tf.keras.metrics.Recall()])
# Printing the summary of the model
model_4.summary()
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_7 (Dense)             (None, 128)               1536      
                                                                 
 dense_8 (Dense)             (None, 64)                8256      
                                                                 
 dense_9 (Dense)             (None, 32)                2080      
                                                                 
 dense_10 (Dense)            (None, 1)                 33        
                                                                 
=================================================================
Total params: 11,905
Trainable params: 11,905
Non-trainable params: 0
_________________________________________________________________
In [52]:
# Fitting the model on the train data
history_4 = model_4.fit(X_train,y_train,validation_split = 0.1,epochs = 50,verbose = 2)
Epoch 1/50
225/225 - 1s - loss: 0.4079 - recall_2: 0.2796 - val_loss: 0.3304 - val_recall_2: 0.4903 - 1s/epoch - 6ms/step
Epoch 2/50
225/225 - 0s - loss: 0.3538 - recall_2: 0.4604 - val_loss: 0.3123 - val_recall_2: 0.4387 - 428ms/epoch - 2ms/step
Epoch 3/50
225/225 - 0s - loss: 0.3447 - recall_2: 0.4794 - val_loss: 0.3083 - val_recall_2: 0.5419 - 408ms/epoch - 2ms/step
Epoch 4/50
225/225 - 0s - loss: 0.3374 - recall_2: 0.4888 - val_loss: 0.3115 - val_recall_2: 0.4903 - 400ms/epoch - 2ms/step
Epoch 5/50
225/225 - 0s - loss: 0.3332 - recall_2: 0.4888 - val_loss: 0.3137 - val_recall_2: 0.5742 - 441ms/epoch - 2ms/step
Epoch 6/50
225/225 - 0s - loss: 0.3292 - recall_2: 0.4997 - val_loss: 0.3119 - val_recall_2: 0.5355 - 391ms/epoch - 2ms/step
Epoch 7/50
225/225 - 0s - loss: 0.3259 - recall_2: 0.4976 - val_loss: 0.3066 - val_recall_2: 0.4710 - 424ms/epoch - 2ms/step
Epoch 8/50
225/225 - 0s - loss: 0.3224 - recall_2: 0.5125 - val_loss: 0.3158 - val_recall_2: 0.4903 - 437ms/epoch - 2ms/step
Epoch 9/50
225/225 - 0s - loss: 0.3192 - recall_2: 0.5146 - val_loss: 0.3121 - val_recall_2: 0.5226 - 433ms/epoch - 2ms/step
Epoch 10/50
225/225 - 0s - loss: 0.3170 - recall_2: 0.5044 - val_loss: 0.3193 - val_recall_2: 0.4387 - 392ms/epoch - 2ms/step
Epoch 11/50
225/225 - 0s - loss: 0.3151 - recall_2: 0.5058 - val_loss: 0.3170 - val_recall_2: 0.5097 - 376ms/epoch - 2ms/step
Epoch 12/50
225/225 - 0s - loss: 0.3116 - recall_2: 0.5315 - val_loss: 0.3198 - val_recall_2: 0.5355 - 415ms/epoch - 2ms/step
Epoch 13/50
225/225 - 0s - loss: 0.3091 - recall_2: 0.5328 - val_loss: 0.3155 - val_recall_2: 0.4645 - 431ms/epoch - 2ms/step
Epoch 14/50
225/225 - 0s - loss: 0.3050 - recall_2: 0.5362 - val_loss: 0.3141 - val_recall_2: 0.5677 - 418ms/epoch - 2ms/step
Epoch 15/50
225/225 - 0s - loss: 0.3014 - recall_2: 0.5423 - val_loss: 0.3174 - val_recall_2: 0.5161 - 424ms/epoch - 2ms/step
Epoch 16/50
225/225 - 0s - loss: 0.3006 - recall_2: 0.5423 - val_loss: 0.3193 - val_recall_2: 0.5226 - 378ms/epoch - 2ms/step
Epoch 17/50
225/225 - 0s - loss: 0.2975 - recall_2: 0.5531 - val_loss: 0.3341 - val_recall_2: 0.4323 - 419ms/epoch - 2ms/step
Epoch 18/50
225/225 - 0s - loss: 0.2929 - recall_2: 0.5579 - val_loss: 0.3248 - val_recall_2: 0.5097 - 410ms/epoch - 2ms/step
Epoch 19/50
225/225 - 0s - loss: 0.2928 - recall_2: 0.5559 - val_loss: 0.3314 - val_recall_2: 0.5677 - 420ms/epoch - 2ms/step
Epoch 20/50
225/225 - 0s - loss: 0.2869 - recall_2: 0.5694 - val_loss: 0.3277 - val_recall_2: 0.4774 - 441ms/epoch - 2ms/step
Epoch 21/50
225/225 - 0s - loss: 0.2855 - recall_2: 0.5674 - val_loss: 0.3420 - val_recall_2: 0.4258 - 366ms/epoch - 2ms/step
Epoch 22/50
225/225 - 0s - loss: 0.2822 - recall_2: 0.5809 - val_loss: 0.3463 - val_recall_2: 0.4194 - 416ms/epoch - 2ms/step
Epoch 23/50
225/225 - 0s - loss: 0.2809 - recall_2: 0.5741 - val_loss: 0.3545 - val_recall_2: 0.4387 - 397ms/epoch - 2ms/step
Epoch 24/50
225/225 - 0s - loss: 0.2773 - recall_2: 0.5836 - val_loss: 0.3374 - val_recall_2: 0.4645 - 412ms/epoch - 2ms/step
Epoch 25/50
225/225 - 0s - loss: 0.2737 - recall_2: 0.5965 - val_loss: 0.3620 - val_recall_2: 0.5806 - 421ms/epoch - 2ms/step
Epoch 26/50
225/225 - 0s - loss: 0.2733 - recall_2: 0.6005 - val_loss: 0.3497 - val_recall_2: 0.5677 - 394ms/epoch - 2ms/step
Epoch 27/50
225/225 - 0s - loss: 0.2689 - recall_2: 0.5965 - val_loss: 0.3484 - val_recall_2: 0.4774 - 414ms/epoch - 2ms/step
Epoch 28/50
225/225 - 0s - loss: 0.2662 - recall_2: 0.6053 - val_loss: 0.3556 - val_recall_2: 0.6065 - 414ms/epoch - 2ms/step
Epoch 29/50
225/225 - 0s - loss: 0.2609 - recall_2: 0.6168 - val_loss: 0.3674 - val_recall_2: 0.4903 - 416ms/epoch - 2ms/step
Epoch 30/50
225/225 - 0s - loss: 0.2616 - recall_2: 0.6087 - val_loss: 0.3671 - val_recall_2: 0.5226 - 386ms/epoch - 2ms/step
Epoch 31/50
225/225 - 0s - loss: 0.2589 - recall_2: 0.6229 - val_loss: 0.3630 - val_recall_2: 0.4774 - 376ms/epoch - 2ms/step
Epoch 32/50
225/225 - 0s - loss: 0.2547 - recall_2: 0.6242 - val_loss: 0.3665 - val_recall_2: 0.5032 - 428ms/epoch - 2ms/step
Epoch 33/50
225/225 - 0s - loss: 0.2533 - recall_2: 0.6344 - val_loss: 0.3618 - val_recall_2: 0.5226 - 408ms/epoch - 2ms/step
Epoch 34/50
225/225 - 0s - loss: 0.2470 - recall_2: 0.6391 - val_loss: 0.3770 - val_recall_2: 0.5484 - 391ms/epoch - 2ms/step
Epoch 35/50
225/225 - 0s - loss: 0.2451 - recall_2: 0.6432 - val_loss: 0.3585 - val_recall_2: 0.5419 - 423ms/epoch - 2ms/step
Epoch 36/50
225/225 - 0s - loss: 0.2433 - recall_2: 0.6486 - val_loss: 0.3834 - val_recall_2: 0.6000 - 376ms/epoch - 2ms/step
Epoch 37/50
225/225 - 0s - loss: 0.2438 - recall_2: 0.6479 - val_loss: 0.3875 - val_recall_2: 0.5806 - 409ms/epoch - 2ms/step
Epoch 38/50
225/225 - 0s - loss: 0.2381 - recall_2: 0.6567 - val_loss: 0.3811 - val_recall_2: 0.5935 - 381ms/epoch - 2ms/step
Epoch 39/50
225/225 - 0s - loss: 0.2355 - recall_2: 0.6547 - val_loss: 0.3852 - val_recall_2: 0.5935 - 387ms/epoch - 2ms/step
Epoch 40/50
225/225 - 0s - loss: 0.2350 - recall_2: 0.6622 - val_loss: 0.3743 - val_recall_2: 0.5677 - 379ms/epoch - 2ms/step
Epoch 41/50
225/225 - 0s - loss: 0.2313 - recall_2: 0.6669 - val_loss: 0.3958 - val_recall_2: 0.5161 - 404ms/epoch - 2ms/step
Epoch 42/50
225/225 - 0s - loss: 0.2297 - recall_2: 0.6730 - val_loss: 0.3860 - val_recall_2: 0.5806 - 415ms/epoch - 2ms/step
Epoch 43/50
225/225 - 0s - loss: 0.2268 - recall_2: 0.6710 - val_loss: 0.4154 - val_recall_2: 0.5742 - 444ms/epoch - 2ms/step
Epoch 44/50
225/225 - 0s - loss: 0.2260 - recall_2: 0.6770 - val_loss: 0.3968 - val_recall_2: 0.5419 - 379ms/epoch - 2ms/step
Epoch 45/50
225/225 - 0s - loss: 0.2222 - recall_2: 0.6750 - val_loss: 0.4069 - val_recall_2: 0.5226 - 417ms/epoch - 2ms/step
Epoch 46/50
225/225 - 0s - loss: 0.2193 - recall_2: 0.6886 - val_loss: 0.3914 - val_recall_2: 0.5290 - 404ms/epoch - 2ms/step
Epoch 47/50
225/225 - 0s - loss: 0.2170 - recall_2: 0.7028 - val_loss: 0.4049 - val_recall_2: 0.5677 - 410ms/epoch - 2ms/step
Epoch 48/50
225/225 - 0s - loss: 0.2184 - recall_2: 0.7055 - val_loss: 0.4167 - val_recall_2: 0.5290 - 403ms/epoch - 2ms/step
Epoch 49/50
225/225 - 0s - loss: 0.2121 - recall_2: 0.7068 - val_loss: 0.4372 - val_recall_2: 0.4774 - 403ms/epoch - 2ms/step
Epoch 50/50
225/225 - 0s - loss: 0.2102 - recall_2: 0.7048 - val_loss: 0.4164 - val_recall_2: 0.5419 - 425ms/epoch - 2ms/step
In [53]:
plt.figure(figsize = (15, 8))
plt.plot(history_4.history['loss'])
plt.plot(history_4.history['val_loss'])
plt.title('loss vs Epochs')
plt.ylabel('loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc = 'lower right')
plt.show()
  • Let's try RMSProp optimizer with a simpler model since the above case overfitted on the training data.
In [54]:
# We will be adding the layers sequentially
model_5 = Sequential()

# First hidden layer with 128 neurons and relu activation function, the input_shape tuple denotes number of independent variables
model_5.add(Dense(128, activation = 'relu', input_shape = (11, )))

# Second hidden layer with 64 neurons and relu activation function
model_5.add(Dense(64, activation = 'relu'))

# Output layer with only one neuron and sigmoid as activation function which will give the probability of customer exiting
model_5.add(Dense(1, activation = 'sigmoid'))
In [55]:
# Compliling the model with binary crossentropy as loss, Adam as optimizer with 0.001 as learning rate and accuracy as metrics
model_5.compile(loss = 'binary_crossentropy', optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.001), metrics=[tf.keras.metrics.Recall()])
# Printing the summary of the model
model_5.summary()
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_11 (Dense)            (None, 128)               1536      
                                                                 
 dense_12 (Dense)            (None, 64)                8256      
                                                                 
 dense_13 (Dense)            (None, 1)                 65        
                                                                 
=================================================================
Total params: 9,857
Trainable params: 9,857
Non-trainable params: 0
_________________________________________________________________
In [56]:
# Fitting the model on the train data
history_5 = model_5.fit(X_train,y_train,validation_split = 0.1,epochs = 50,verbose = 2)
Epoch 1/50
225/225 - 1s - loss: 0.4284 - recall_3: 0.2221 - val_loss: 0.3680 - val_recall_3: 0.3871 - 1s/epoch - 6ms/step
Epoch 2/50
225/225 - 0s - loss: 0.3742 - recall_3: 0.3859 - val_loss: 0.3234 - val_recall_3: 0.4323 - 429ms/epoch - 2ms/step
Epoch 3/50
225/225 - 0s - loss: 0.3516 - recall_3: 0.4455 - val_loss: 0.3094 - val_recall_3: 0.5226 - 396ms/epoch - 2ms/step
Epoch 4/50
225/225 - 0s - loss: 0.3415 - recall_3: 0.4766 - val_loss: 0.3106 - val_recall_3: 0.4645 - 358ms/epoch - 2ms/step
Epoch 5/50
225/225 - 0s - loss: 0.3371 - recall_3: 0.4821 - val_loss: 0.3040 - val_recall_3: 0.5548 - 405ms/epoch - 2ms/step
Epoch 6/50
225/225 - 0s - loss: 0.3319 - recall_3: 0.4895 - val_loss: 0.3106 - val_recall_3: 0.5097 - 380ms/epoch - 2ms/step
Epoch 7/50
225/225 - 0s - loss: 0.3290 - recall_3: 0.5003 - val_loss: 0.2998 - val_recall_3: 0.4710 - 404ms/epoch - 2ms/step
Epoch 8/50
225/225 - 0s - loss: 0.3258 - recall_3: 0.5024 - val_loss: 0.3065 - val_recall_3: 0.4581 - 411ms/epoch - 2ms/step
Epoch 9/50
225/225 - 0s - loss: 0.3223 - recall_3: 0.5071 - val_loss: 0.3046 - val_recall_3: 0.4968 - 372ms/epoch - 2ms/step
Epoch 10/50
225/225 - 0s - loss: 0.3198 - recall_3: 0.5132 - val_loss: 0.3095 - val_recall_3: 0.4129 - 407ms/epoch - 2ms/step
Epoch 11/50
225/225 - 0s - loss: 0.3187 - recall_3: 0.4997 - val_loss: 0.3089 - val_recall_3: 0.4903 - 416ms/epoch - 2ms/step
Epoch 12/50
225/225 - 0s - loss: 0.3155 - recall_3: 0.5193 - val_loss: 0.3132 - val_recall_3: 0.4774 - 409ms/epoch - 2ms/step
Epoch 13/50
225/225 - 0s - loss: 0.3135 - recall_3: 0.5301 - val_loss: 0.3130 - val_recall_3: 0.4516 - 408ms/epoch - 2ms/step
Epoch 14/50
225/225 - 0s - loss: 0.3099 - recall_3: 0.5159 - val_loss: 0.3101 - val_recall_3: 0.5226 - 423ms/epoch - 2ms/step
Epoch 15/50
225/225 - 0s - loss: 0.3079 - recall_3: 0.5349 - val_loss: 0.3139 - val_recall_3: 0.4774 - 382ms/epoch - 2ms/step
Epoch 16/50
225/225 - 0s - loss: 0.3065 - recall_3: 0.5403 - val_loss: 0.3119 - val_recall_3: 0.5097 - 362ms/epoch - 2ms/step
Epoch 17/50
225/225 - 0s - loss: 0.3052 - recall_3: 0.5423 - val_loss: 0.3257 - val_recall_3: 0.4129 - 405ms/epoch - 2ms/step
Epoch 18/50
225/225 - 1s - loss: 0.3024 - recall_3: 0.5437 - val_loss: 0.3154 - val_recall_3: 0.4774 - 947ms/epoch - 4ms/step
Epoch 19/50
225/225 - 0s - loss: 0.3013 - recall_3: 0.5376 - val_loss: 0.3177 - val_recall_3: 0.5548 - 496ms/epoch - 2ms/step
Epoch 20/50
225/225 - 0s - loss: 0.2983 - recall_3: 0.5484 - val_loss: 0.3223 - val_recall_3: 0.4258 - 397ms/epoch - 2ms/step
Epoch 21/50
225/225 - 0s - loss: 0.2960 - recall_3: 0.5498 - val_loss: 0.3282 - val_recall_3: 0.4581 - 386ms/epoch - 2ms/step
Epoch 22/50
225/225 - 0s - loss: 0.2937 - recall_3: 0.5599 - val_loss: 0.3338 - val_recall_3: 0.4258 - 369ms/epoch - 2ms/step
Epoch 23/50
225/225 - 0s - loss: 0.2934 - recall_3: 0.5538 - val_loss: 0.3299 - val_recall_3: 0.4645 - 401ms/epoch - 2ms/step
Epoch 24/50
225/225 - 0s - loss: 0.2898 - recall_3: 0.5572 - val_loss: 0.3225 - val_recall_3: 0.4581 - 381ms/epoch - 2ms/step
Epoch 25/50
225/225 - 0s - loss: 0.2882 - recall_3: 0.5674 - val_loss: 0.3328 - val_recall_3: 0.4968 - 396ms/epoch - 2ms/step
Epoch 26/50
225/225 - 0s - loss: 0.2881 - recall_3: 0.5647 - val_loss: 0.3237 - val_recall_3: 0.5097 - 378ms/epoch - 2ms/step
Epoch 27/50
225/225 - 0s - loss: 0.2854 - recall_3: 0.5674 - val_loss: 0.3351 - val_recall_3: 0.4645 - 373ms/epoch - 2ms/step
Epoch 28/50
225/225 - 0s - loss: 0.2829 - recall_3: 0.5768 - val_loss: 0.3316 - val_recall_3: 0.5871 - 364ms/epoch - 2ms/step
Epoch 29/50
225/225 - 0s - loss: 0.2802 - recall_3: 0.5856 - val_loss: 0.3374 - val_recall_3: 0.4968 - 375ms/epoch - 2ms/step
Epoch 30/50
225/225 - 0s - loss: 0.2806 - recall_3: 0.5850 - val_loss: 0.3346 - val_recall_3: 0.5548 - 360ms/epoch - 2ms/step
Epoch 31/50
225/225 - 0s - loss: 0.2783 - recall_3: 0.5863 - val_loss: 0.3412 - val_recall_3: 0.4452 - 409ms/epoch - 2ms/step
Epoch 32/50
225/225 - 0s - loss: 0.2763 - recall_3: 0.5836 - val_loss: 0.3309 - val_recall_3: 0.4968 - 356ms/epoch - 2ms/step
Epoch 33/50
225/225 - 0s - loss: 0.2738 - recall_3: 0.5972 - val_loss: 0.3359 - val_recall_3: 0.4903 - 415ms/epoch - 2ms/step
Epoch 34/50
225/225 - 0s - loss: 0.2714 - recall_3: 0.6012 - val_loss: 0.3427 - val_recall_3: 0.4645 - 375ms/epoch - 2ms/step
Epoch 35/50
225/225 - 0s - loss: 0.2699 - recall_3: 0.5978 - val_loss: 0.3378 - val_recall_3: 0.4645 - 354ms/epoch - 2ms/step
Epoch 36/50
225/225 - 0s - loss: 0.2694 - recall_3: 0.5917 - val_loss: 0.3399 - val_recall_3: 0.5613 - 374ms/epoch - 2ms/step
Epoch 37/50
225/225 - 0s - loss: 0.2688 - recall_3: 0.5951 - val_loss: 0.3406 - val_recall_3: 0.5290 - 388ms/epoch - 2ms/step
Epoch 38/50
225/225 - 0s - loss: 0.2648 - recall_3: 0.6019 - val_loss: 0.3416 - val_recall_3: 0.5742 - 389ms/epoch - 2ms/step
Epoch 39/50
225/225 - 0s - loss: 0.2631 - recall_3: 0.6039 - val_loss: 0.3418 - val_recall_3: 0.5677 - 440ms/epoch - 2ms/step
Epoch 40/50
225/225 - 0s - loss: 0.2626 - recall_3: 0.6100 - val_loss: 0.3365 - val_recall_3: 0.5548 - 362ms/epoch - 2ms/step
Epoch 41/50
225/225 - 0s - loss: 0.2612 - recall_3: 0.6134 - val_loss: 0.3454 - val_recall_3: 0.5032 - 402ms/epoch - 2ms/step
Epoch 42/50
225/225 - 0s - loss: 0.2596 - recall_3: 0.6134 - val_loss: 0.3415 - val_recall_3: 0.5548 - 407ms/epoch - 2ms/step
Epoch 43/50
225/225 - 0s - loss: 0.2573 - recall_3: 0.6175 - val_loss: 0.3587 - val_recall_3: 0.5097 - 361ms/epoch - 2ms/step
Epoch 44/50
225/225 - 0s - loss: 0.2563 - recall_3: 0.6161 - val_loss: 0.3480 - val_recall_3: 0.5032 - 412ms/epoch - 2ms/step
Epoch 45/50
225/225 - 0s - loss: 0.2552 - recall_3: 0.6222 - val_loss: 0.3576 - val_recall_3: 0.5161 - 405ms/epoch - 2ms/step
Epoch 46/50
225/225 - 0s - loss: 0.2522 - recall_3: 0.6439 - val_loss: 0.3518 - val_recall_3: 0.5161 - 382ms/epoch - 2ms/step
Epoch 47/50
225/225 - 0s - loss: 0.2513 - recall_3: 0.6256 - val_loss: 0.3479 - val_recall_3: 0.5613 - 372ms/epoch - 2ms/step
Epoch 48/50
225/225 - 0s - loss: 0.2492 - recall_3: 0.6344 - val_loss: 0.3665 - val_recall_3: 0.4710 - 408ms/epoch - 2ms/step
Epoch 49/50
225/225 - 0s - loss: 0.2469 - recall_3: 0.6452 - val_loss: 0.3580 - val_recall_3: 0.4903 - 362ms/epoch - 2ms/step
Epoch 50/50
225/225 - 0s - loss: 0.2449 - recall_3: 0.6357 - val_loss: 0.3525 - val_recall_3: 0.4839 - 404ms/epoch - 2ms/step
In [57]:
plt.figure(figsize = (15, 8))
plt.plot(history_5.history['loss'])
plt.plot(history_5.history['val_loss'])
plt.title('loss vs Epochs')
plt.ylabel('loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc = 'lower right')
plt.show()
In [78]:
# We use smote to oversample as a way to handle the class imbalance.

print("Before Oversampling")
print(y_train.value_counts())


sm = SMOTE(
    sampling_strategy=1, k_neighbors=5, random_state=1
)  # Synthetic Minority Over Sampling Technique
X_train_over, y_train_over = sm.fit_resample(X_train, y_train)


print("After Oversampling")
print(y_train_over.value_counts())

print("After Oversampling, the shape of train_X: {}".format(X_train_over.shape))
print("After Oversampling, the shape of train_y: {} \n".format(y_train_over.shape))
Before Oversampling
Exited
0         6368
1         1632
dtype: int64
After Oversampling
Exited
0         6368
1         6368
dtype: int64
After Oversampling, the shape of train_X: (12736, 11)
After Oversampling, the shape of train_y: (12736, 1) 

In [79]:
# We will be adding the layers sequentially
model_6 = Sequential()

# First hidden layer with 128 neurons and relu activation function, the input_shape tuple denotes number of independent variables
model_6.add(Dense(128, activation = 'relu', input_shape = (11, )))

# Second hidden layer with 64 neurons and relu activation function
model_6.add(Dense(64, activation = 'relu'))

# Output layer with only one neuron and sigmoid as activation function which will give the probability of customer exiting
model_6.add(Dense(1, activation = 'sigmoid'))

# Compliling the model with binary crossentropy as loss, SGD as optimizer and accuracy as metrics
model_6.compile(loss = 'binary_crossentropy', optimizer = 'SGD', metrics=[tf.keras.metrics.Recall()])

history_6 = model_6.fit(X_train_over, y_train_over,validation_split = 0.1,epochs = 150,verbose = 2)

model_6.evaluate(X_test, y_test, verbose = 1)

test_pred = np.round(model_6.predict(X_test))
Epoch 1/150
359/359 - 1s - loss: 0.6328 - recall_6: 0.4692 - val_loss: 0.6663 - val_recall_6: 0.5824 - 1s/epoch - 4ms/step
Epoch 2/150
359/359 - 1s - loss: 0.5672 - recall_6: 0.6158 - val_loss: 0.6616 - val_recall_6: 0.5973 - 591ms/epoch - 2ms/step
Epoch 3/150
359/359 - 1s - loss: 0.5369 - recall_6: 0.6474 - val_loss: 0.6498 - val_recall_6: 0.6248 - 585ms/epoch - 2ms/step
Epoch 4/150
359/359 - 1s - loss: 0.5161 - recall_6: 0.6647 - val_loss: 0.6014 - val_recall_6: 0.6633 - 583ms/epoch - 2ms/step
Epoch 5/150
359/359 - 1s - loss: 0.4970 - recall_6: 0.6853 - val_loss: 0.5620 - val_recall_6: 0.7111 - 602ms/epoch - 2ms/step
Epoch 6/150
359/359 - 1s - loss: 0.4797 - recall_6: 0.7079 - val_loss: 0.5808 - val_recall_6: 0.7064 - 574ms/epoch - 2ms/step
Epoch 7/150
359/359 - 1s - loss: 0.4655 - recall_6: 0.7181 - val_loss: 0.4993 - val_recall_6: 0.7598 - 558ms/epoch - 2ms/step
Epoch 8/150
359/359 - 1s - loss: 0.4546 - recall_6: 0.7314 - val_loss: 0.5488 - val_recall_6: 0.7276 - 569ms/epoch - 2ms/step
Epoch 9/150
359/359 - 1s - loss: 0.4471 - recall_6: 0.7326 - val_loss: 0.5100 - val_recall_6: 0.7520 - 581ms/epoch - 2ms/step
Epoch 10/150
359/359 - 1s - loss: 0.4415 - recall_6: 0.7344 - val_loss: 0.5158 - val_recall_6: 0.7449 - 564ms/epoch - 2ms/step
Epoch 11/150
359/359 - 1s - loss: 0.4376 - recall_6: 0.7371 - val_loss: 0.5184 - val_recall_6: 0.7449 - 602ms/epoch - 2ms/step
Epoch 12/150
359/359 - 1s - loss: 0.4345 - recall_6: 0.7405 - val_loss: 0.4969 - val_recall_6: 0.7543 - 581ms/epoch - 2ms/step
Epoch 13/150
359/359 - 1s - loss: 0.4320 - recall_6: 0.7401 - val_loss: 0.4726 - val_recall_6: 0.7692 - 586ms/epoch - 2ms/step
Epoch 14/150
359/359 - 1s - loss: 0.4295 - recall_6: 0.7448 - val_loss: 0.5158 - val_recall_6: 0.7425 - 569ms/epoch - 2ms/step
Epoch 15/150
359/359 - 1s - loss: 0.4271 - recall_6: 0.7436 - val_loss: 0.4773 - val_recall_6: 0.7724 - 575ms/epoch - 2ms/step
Epoch 16/150
359/359 - 1s - loss: 0.4254 - recall_6: 0.7450 - val_loss: 0.4769 - val_recall_6: 0.7724 - 572ms/epoch - 2ms/step
Epoch 17/150
359/359 - 1s - loss: 0.4237 - recall_6: 0.7468 - val_loss: 0.5049 - val_recall_6: 0.7527 - 570ms/epoch - 2ms/step
Epoch 18/150
359/359 - 1s - loss: 0.4220 - recall_6: 0.7466 - val_loss: 0.5015 - val_recall_6: 0.7575 - 553ms/epoch - 2ms/step
Epoch 19/150
359/359 - 1s - loss: 0.4205 - recall_6: 0.7515 - val_loss: 0.5029 - val_recall_6: 0.7520 - 593ms/epoch - 2ms/step
Epoch 20/150
359/359 - 1s - loss: 0.4187 - recall_6: 0.7477 - val_loss: 0.4789 - val_recall_6: 0.7653 - 564ms/epoch - 2ms/step
Epoch 21/150
359/359 - 1s - loss: 0.4171 - recall_6: 0.7507 - val_loss: 0.4728 - val_recall_6: 0.7708 - 575ms/epoch - 2ms/step
Epoch 22/150
359/359 - 1s - loss: 0.4156 - recall_6: 0.7544 - val_loss: 0.4454 - val_recall_6: 0.7928 - 592ms/epoch - 2ms/step
Epoch 23/150
359/359 - 1s - loss: 0.4144 - recall_6: 0.7542 - val_loss: 0.4538 - val_recall_6: 0.7881 - 593ms/epoch - 2ms/step
Epoch 24/150
359/359 - 1s - loss: 0.4127 - recall_6: 0.7536 - val_loss: 0.4483 - val_recall_6: 0.7865 - 587ms/epoch - 2ms/step
Epoch 25/150
359/359 - 1s - loss: 0.4113 - recall_6: 0.7556 - val_loss: 0.4526 - val_recall_6: 0.7834 - 558ms/epoch - 2ms/step
Epoch 26/150
359/359 - 1s - loss: 0.4103 - recall_6: 0.7580 - val_loss: 0.4537 - val_recall_6: 0.7865 - 598ms/epoch - 2ms/step
Epoch 27/150
359/359 - 1s - loss: 0.4085 - recall_6: 0.7587 - val_loss: 0.4817 - val_recall_6: 0.7692 - 581ms/epoch - 2ms/step
Epoch 28/150
359/359 - 1s - loss: 0.4070 - recall_6: 0.7607 - val_loss: 0.5039 - val_recall_6: 0.7551 - 576ms/epoch - 2ms/step
Epoch 29/150
359/359 - 1s - loss: 0.4059 - recall_6: 0.7605 - val_loss: 0.4589 - val_recall_6: 0.7818 - 568ms/epoch - 2ms/step
Epoch 30/150
359/359 - 1s - loss: 0.4044 - recall_6: 0.7636 - val_loss: 0.4717 - val_recall_6: 0.7724 - 579ms/epoch - 2ms/step
Epoch 31/150
359/359 - 1s - loss: 0.4028 - recall_6: 0.7597 - val_loss: 0.4294 - val_recall_6: 0.7983 - 578ms/epoch - 2ms/step
Epoch 32/150
359/359 - 1s - loss: 0.4015 - recall_6: 0.7636 - val_loss: 0.4303 - val_recall_6: 0.7920 - 560ms/epoch - 2ms/step
Epoch 33/150
359/359 - 1s - loss: 0.4003 - recall_6: 0.7664 - val_loss: 0.5161 - val_recall_6: 0.7480 - 579ms/epoch - 2ms/step
Epoch 34/150
359/359 - 1s - loss: 0.3989 - recall_6: 0.7654 - val_loss: 0.3994 - val_recall_6: 0.8116 - 564ms/epoch - 2ms/step
Epoch 35/150
359/359 - 1s - loss: 0.3974 - recall_6: 0.7691 - val_loss: 0.5015 - val_recall_6: 0.7512 - 578ms/epoch - 2ms/step
Epoch 36/150
359/359 - 1s - loss: 0.3956 - recall_6: 0.7670 - val_loss: 0.4613 - val_recall_6: 0.7724 - 611ms/epoch - 2ms/step
Epoch 37/150
359/359 - 1s - loss: 0.3945 - recall_6: 0.7684 - val_loss: 0.4536 - val_recall_6: 0.7786 - 580ms/epoch - 2ms/step
Epoch 38/150
359/359 - 1s - loss: 0.3931 - recall_6: 0.7709 - val_loss: 0.4541 - val_recall_6: 0.7841 - 566ms/epoch - 2ms/step
Epoch 39/150
359/359 - 1s - loss: 0.3921 - recall_6: 0.7689 - val_loss: 0.4123 - val_recall_6: 0.8085 - 571ms/epoch - 2ms/step
Epoch 40/150
359/359 - 1s - loss: 0.3904 - recall_6: 0.7699 - val_loss: 0.4180 - val_recall_6: 0.8022 - 619ms/epoch - 2ms/step
Epoch 41/150
359/359 - 1s - loss: 0.3893 - recall_6: 0.7737 - val_loss: 0.3937 - val_recall_6: 0.8116 - 580ms/epoch - 2ms/step
Epoch 42/150
359/359 - 1s - loss: 0.3885 - recall_6: 0.7744 - val_loss: 0.4602 - val_recall_6: 0.7747 - 557ms/epoch - 2ms/step
Epoch 43/150
359/359 - 1s - loss: 0.3863 - recall_6: 0.7764 - val_loss: 0.4420 - val_recall_6: 0.7889 - 588ms/epoch - 2ms/step
Epoch 44/150
359/359 - 1s - loss: 0.3857 - recall_6: 0.7744 - val_loss: 0.4407 - val_recall_6: 0.7849 - 548ms/epoch - 2ms/step
Epoch 45/150
359/359 - 1s - loss: 0.3843 - recall_6: 0.7748 - val_loss: 0.4255 - val_recall_6: 0.7959 - 578ms/epoch - 2ms/step
Epoch 46/150
359/359 - 1s - loss: 0.3827 - recall_6: 0.7797 - val_loss: 0.4424 - val_recall_6: 0.7841 - 581ms/epoch - 2ms/step
Epoch 47/150
359/359 - 1s - loss: 0.3815 - recall_6: 0.7729 - val_loss: 0.4217 - val_recall_6: 0.7983 - 567ms/epoch - 2ms/step
Epoch 48/150
359/359 - 1s - loss: 0.3803 - recall_6: 0.7772 - val_loss: 0.4615 - val_recall_6: 0.7755 - 565ms/epoch - 2ms/step
Epoch 49/150
359/359 - 1s - loss: 0.3791 - recall_6: 0.7752 - val_loss: 0.4129 - val_recall_6: 0.8030 - 579ms/epoch - 2ms/step
Epoch 50/150
359/359 - 1s - loss: 0.3778 - recall_6: 0.7740 - val_loss: 0.4010 - val_recall_6: 0.8108 - 595ms/epoch - 2ms/step
Epoch 51/150
359/359 - 1s - loss: 0.3762 - recall_6: 0.7772 - val_loss: 0.4320 - val_recall_6: 0.7936 - 558ms/epoch - 2ms/step
Epoch 52/150
359/359 - 1s - loss: 0.3753 - recall_6: 0.7784 - val_loss: 0.3619 - val_recall_6: 0.8352 - 580ms/epoch - 2ms/step
Epoch 53/150
359/359 - 1s - loss: 0.3740 - recall_6: 0.7797 - val_loss: 0.4183 - val_recall_6: 0.7959 - 563ms/epoch - 2ms/step
Epoch 54/150
359/359 - 1s - loss: 0.3724 - recall_6: 0.7813 - val_loss: 0.4120 - val_recall_6: 0.8069 - 602ms/epoch - 2ms/step
Epoch 55/150
359/359 - 1s - loss: 0.3707 - recall_6: 0.7878 - val_loss: 0.3771 - val_recall_6: 0.8250 - 593ms/epoch - 2ms/step
Epoch 56/150
359/359 - 1s - loss: 0.3697 - recall_6: 0.7845 - val_loss: 0.4030 - val_recall_6: 0.8046 - 591ms/epoch - 2ms/step
Epoch 57/150
359/359 - 1s - loss: 0.3682 - recall_6: 0.7868 - val_loss: 0.4247 - val_recall_6: 0.7928 - 596ms/epoch - 2ms/step
Epoch 58/150
359/359 - 1s - loss: 0.3666 - recall_6: 0.7878 - val_loss: 0.4123 - val_recall_6: 0.7991 - 575ms/epoch - 2ms/step
Epoch 59/150
359/359 - 1s - loss: 0.3654 - recall_6: 0.7852 - val_loss: 0.4096 - val_recall_6: 0.7983 - 575ms/epoch - 2ms/step
Epoch 60/150
359/359 - 1s - loss: 0.3640 - recall_6: 0.7866 - val_loss: 0.3858 - val_recall_6: 0.8203 - 583ms/epoch - 2ms/step
Epoch 61/150
359/359 - 1s - loss: 0.3630 - recall_6: 0.7923 - val_loss: 0.3921 - val_recall_6: 0.8179 - 591ms/epoch - 2ms/step
Epoch 62/150
359/359 - 1s - loss: 0.3613 - recall_6: 0.7939 - val_loss: 0.4617 - val_recall_6: 0.7637 - 588ms/epoch - 2ms/step
Epoch 63/150
359/359 - 1s - loss: 0.3602 - recall_6: 0.7892 - val_loss: 0.3579 - val_recall_6: 0.8336 - 575ms/epoch - 2ms/step
Epoch 64/150
359/359 - 1s - loss: 0.3587 - recall_6: 0.7986 - val_loss: 0.3625 - val_recall_6: 0.8265 - 591ms/epoch - 2ms/step
Epoch 65/150
359/359 - 1s - loss: 0.3571 - recall_6: 0.7913 - val_loss: 0.2995 - val_recall_6: 0.8791 - 574ms/epoch - 2ms/step
Epoch 66/150
359/359 - 1s - loss: 0.3558 - recall_6: 0.7982 - val_loss: 0.3248 - val_recall_6: 0.8540 - 593ms/epoch - 2ms/step
Epoch 67/150
359/359 - 1s - loss: 0.3538 - recall_6: 0.7976 - val_loss: 0.3621 - val_recall_6: 0.8273 - 557ms/epoch - 2ms/step
Epoch 68/150
359/359 - 1s - loss: 0.3528 - recall_6: 0.7980 - val_loss: 0.3668 - val_recall_6: 0.8336 - 601ms/epoch - 2ms/step
Epoch 69/150
359/359 - 1s - loss: 0.3515 - recall_6: 0.7974 - val_loss: 0.4046 - val_recall_6: 0.8069 - 592ms/epoch - 2ms/step
Epoch 70/150
359/359 - 1s - loss: 0.3498 - recall_6: 0.7988 - val_loss: 0.3604 - val_recall_6: 0.8320 - 613ms/epoch - 2ms/step
Epoch 71/150
359/359 - 1s - loss: 0.3489 - recall_6: 0.8021 - val_loss: 0.3983 - val_recall_6: 0.8069 - 569ms/epoch - 2ms/step
Epoch 72/150
359/359 - 1s - loss: 0.3477 - recall_6: 0.8053 - val_loss: 0.4168 - val_recall_6: 0.7951 - 583ms/epoch - 2ms/step
Epoch 73/150
359/359 - 1s - loss: 0.3452 - recall_6: 0.8013 - val_loss: 0.3126 - val_recall_6: 0.8634 - 616ms/epoch - 2ms/step
Epoch 74/150
359/359 - 1s - loss: 0.3450 - recall_6: 0.8060 - val_loss: 0.3618 - val_recall_6: 0.8352 - 587ms/epoch - 2ms/step
Epoch 75/150
359/359 - 1s - loss: 0.3430 - recall_6: 0.8066 - val_loss: 0.3364 - val_recall_6: 0.8532 - 590ms/epoch - 2ms/step
Epoch 76/150
359/359 - 1s - loss: 0.3413 - recall_6: 0.8113 - val_loss: 0.3435 - val_recall_6: 0.8438 - 598ms/epoch - 2ms/step
Epoch 77/150
359/359 - 1s - loss: 0.3398 - recall_6: 0.8119 - val_loss: 0.4148 - val_recall_6: 0.7912 - 582ms/epoch - 2ms/step
Epoch 78/150
359/359 - 1s - loss: 0.3382 - recall_6: 0.8102 - val_loss: 0.2857 - val_recall_6: 0.8823 - 600ms/epoch - 2ms/step
Epoch 79/150
359/359 - 1s - loss: 0.3376 - recall_6: 0.8131 - val_loss: 0.4387 - val_recall_6: 0.7881 - 569ms/epoch - 2ms/step
Epoch 80/150
359/359 - 1s - loss: 0.3356 - recall_6: 0.8153 - val_loss: 0.3592 - val_recall_6: 0.8312 - 604ms/epoch - 2ms/step
Epoch 81/150
359/359 - 1s - loss: 0.3352 - recall_6: 0.8145 - val_loss: 0.3876 - val_recall_6: 0.8171 - 575ms/epoch - 2ms/step
Epoch 82/150
359/359 - 1s - loss: 0.3325 - recall_6: 0.8131 - val_loss: 0.3689 - val_recall_6: 0.8336 - 577ms/epoch - 2ms/step
Epoch 83/150
359/359 - 1s - loss: 0.3318 - recall_6: 0.8174 - val_loss: 0.3659 - val_recall_6: 0.8312 - 588ms/epoch - 2ms/step
Epoch 84/150
359/359 - 1s - loss: 0.3299 - recall_6: 0.8208 - val_loss: 0.3453 - val_recall_6: 0.8446 - 591ms/epoch - 2ms/step
Epoch 85/150
359/359 - 1s - loss: 0.3283 - recall_6: 0.8204 - val_loss: 0.3006 - val_recall_6: 0.8721 - 590ms/epoch - 2ms/step
Epoch 86/150
359/359 - 1s - loss: 0.3272 - recall_6: 0.8235 - val_loss: 0.3176 - val_recall_6: 0.8619 - 587ms/epoch - 2ms/step
Epoch 87/150
359/359 - 1s - loss: 0.3249 - recall_6: 0.8225 - val_loss: 0.3720 - val_recall_6: 0.8218 - 598ms/epoch - 2ms/step
Epoch 88/150
359/359 - 1s - loss: 0.3236 - recall_6: 0.8231 - val_loss: 0.3012 - val_recall_6: 0.8689 - 591ms/epoch - 2ms/step
Epoch 89/150
359/359 - 1s - loss: 0.3234 - recall_6: 0.8302 - val_loss: 0.2502 - val_recall_6: 0.8964 - 574ms/epoch - 2ms/step
Epoch 90/150
359/359 - 1s - loss: 0.3208 - recall_6: 0.8288 - val_loss: 0.2831 - val_recall_6: 0.8807 - 597ms/epoch - 2ms/step
Epoch 91/150
359/359 - 1s - loss: 0.3195 - recall_6: 0.8280 - val_loss: 0.3359 - val_recall_6: 0.8469 - 568ms/epoch - 2ms/step
Epoch 92/150
359/359 - 1s - loss: 0.3175 - recall_6: 0.8294 - val_loss: 0.3114 - val_recall_6: 0.8579 - 595ms/epoch - 2ms/step
Epoch 93/150
359/359 - 1s - loss: 0.3159 - recall_6: 0.8337 - val_loss: 0.3074 - val_recall_6: 0.8571 - 610ms/epoch - 2ms/step
Epoch 94/150
359/359 - 1s - loss: 0.3152 - recall_6: 0.8265 - val_loss: 0.2341 - val_recall_6: 0.9074 - 596ms/epoch - 2ms/step
Epoch 95/150
359/359 - 1s - loss: 0.3150 - recall_6: 0.8353 - val_loss: 0.2669 - val_recall_6: 0.8987 - 597ms/epoch - 2ms/step
Epoch 96/150
359/359 - 1s - loss: 0.3118 - recall_6: 0.8363 - val_loss: 0.3766 - val_recall_6: 0.8155 - 587ms/epoch - 2ms/step
Epoch 97/150
359/359 - 1s - loss: 0.3111 - recall_6: 0.8388 - val_loss: 0.3819 - val_recall_6: 0.8218 - 607ms/epoch - 2ms/step
Epoch 98/150
359/359 - 1s - loss: 0.3084 - recall_6: 0.8400 - val_loss: 0.4267 - val_recall_6: 0.7951 - 580ms/epoch - 2ms/step
Epoch 99/150
359/359 - 1s - loss: 0.3079 - recall_6: 0.8327 - val_loss: 0.4533 - val_recall_6: 0.7771 - 570ms/epoch - 2ms/step
Epoch 100/150
359/359 - 1s - loss: 0.3071 - recall_6: 0.8351 - val_loss: 0.3585 - val_recall_6: 0.8352 - 600ms/epoch - 2ms/step
Epoch 101/150
359/359 - 1s - loss: 0.3047 - recall_6: 0.8396 - val_loss: 0.4278 - val_recall_6: 0.7951 - 587ms/epoch - 2ms/step
Epoch 102/150
359/359 - 1s - loss: 0.3031 - recall_6: 0.8459 - val_loss: 0.2926 - val_recall_6: 0.8752 - 589ms/epoch - 2ms/step
Epoch 103/150
359/359 - 1s - loss: 0.3016 - recall_6: 0.8414 - val_loss: 0.3256 - val_recall_6: 0.8485 - 590ms/epoch - 2ms/step
Epoch 104/150
359/359 - 1s - loss: 0.3009 - recall_6: 0.8469 - val_loss: 0.4006 - val_recall_6: 0.8093 - 604ms/epoch - 2ms/step
Epoch 105/150
359/359 - 1s - loss: 0.2995 - recall_6: 0.8441 - val_loss: 0.3468 - val_recall_6: 0.8438 - 596ms/epoch - 2ms/step
Epoch 106/150
359/359 - 1s - loss: 0.2972 - recall_6: 0.8492 - val_loss: 0.4288 - val_recall_6: 0.7889 - 568ms/epoch - 2ms/step
Epoch 107/150
359/359 - 1s - loss: 0.2973 - recall_6: 0.8508 - val_loss: 0.3056 - val_recall_6: 0.8728 - 613ms/epoch - 2ms/step
Epoch 108/150
359/359 - 1s - loss: 0.2959 - recall_6: 0.8483 - val_loss: 0.4923 - val_recall_6: 0.7465 - 565ms/epoch - 2ms/step
Epoch 109/150
359/359 - 1s - loss: 0.2945 - recall_6: 0.8508 - val_loss: 0.3417 - val_recall_6: 0.8485 - 627ms/epoch - 2ms/step
Epoch 110/150
359/359 - 1s - loss: 0.2927 - recall_6: 0.8508 - val_loss: 0.2541 - val_recall_6: 0.8940 - 577ms/epoch - 2ms/step
Epoch 111/150
359/359 - 1s - loss: 0.2925 - recall_6: 0.8498 - val_loss: 0.3134 - val_recall_6: 0.8603 - 618ms/epoch - 2ms/step
Epoch 112/150
359/359 - 1s - loss: 0.2894 - recall_6: 0.8565 - val_loss: 0.2345 - val_recall_6: 0.9105 - 570ms/epoch - 2ms/step
Epoch 113/150
359/359 - 1s - loss: 0.2883 - recall_6: 0.8534 - val_loss: 0.4169 - val_recall_6: 0.8093 - 582ms/epoch - 2ms/step
Epoch 114/150
359/359 - 1s - loss: 0.2866 - recall_6: 0.8547 - val_loss: 0.3714 - val_recall_6: 0.8359 - 587ms/epoch - 2ms/step
Epoch 115/150
359/359 - 1s - loss: 0.2871 - recall_6: 0.8536 - val_loss: 0.3546 - val_recall_6: 0.8422 - 557ms/epoch - 2ms/step
Epoch 116/150
359/359 - 1s - loss: 0.2851 - recall_6: 0.8571 - val_loss: 0.3393 - val_recall_6: 0.8438 - 589ms/epoch - 2ms/step
Epoch 117/150
359/359 - 1s - loss: 0.2825 - recall_6: 0.8590 - val_loss: 0.2926 - val_recall_6: 0.8697 - 566ms/epoch - 2ms/step
Epoch 118/150
359/359 - 1s - loss: 0.2825 - recall_6: 0.8589 - val_loss: 0.4937 - val_recall_6: 0.7504 - 566ms/epoch - 2ms/step
Epoch 119/150
359/359 - 1s - loss: 0.2814 - recall_6: 0.8587 - val_loss: 0.3009 - val_recall_6: 0.8697 - 581ms/epoch - 2ms/step
Epoch 120/150
359/359 - 1s - loss: 0.2798 - recall_6: 0.8626 - val_loss: 0.1867 - val_recall_6: 0.9184 - 591ms/epoch - 2ms/step
Epoch 121/150
359/359 - 1s - loss: 0.2783 - recall_6: 0.8642 - val_loss: 0.4176 - val_recall_6: 0.8108 - 582ms/epoch - 2ms/step
Epoch 122/150
359/359 - 1s - loss: 0.2788 - recall_6: 0.8600 - val_loss: 0.3870 - val_recall_6: 0.8116 - 565ms/epoch - 2ms/step
Epoch 123/150
359/359 - 1s - loss: 0.2778 - recall_6: 0.8640 - val_loss: 0.3303 - val_recall_6: 0.8469 - 588ms/epoch - 2ms/step
Epoch 124/150
359/359 - 1s - loss: 0.2749 - recall_6: 0.8647 - val_loss: 0.5781 - val_recall_6: 0.7017 - 580ms/epoch - 2ms/step
Epoch 125/150
359/359 - 1s - loss: 0.2733 - recall_6: 0.8655 - val_loss: 0.3401 - val_recall_6: 0.8485 - 587ms/epoch - 2ms/step
Epoch 126/150
359/359 - 1s - loss: 0.2731 - recall_6: 0.8642 - val_loss: 0.2850 - val_recall_6: 0.8783 - 599ms/epoch - 2ms/step
Epoch 127/150
359/359 - 1s - loss: 0.2703 - recall_6: 0.8685 - val_loss: 0.3265 - val_recall_6: 0.8548 - 567ms/epoch - 2ms/step
Epoch 128/150
359/359 - 1s - loss: 0.2691 - recall_6: 0.8722 - val_loss: 0.3078 - val_recall_6: 0.8666 - 586ms/epoch - 2ms/step
Epoch 129/150
359/359 - 1s - loss: 0.2685 - recall_6: 0.8685 - val_loss: 0.2134 - val_recall_6: 0.9199 - 563ms/epoch - 2ms/step
Epoch 130/150
359/359 - 1s - loss: 0.2676 - recall_6: 0.8693 - val_loss: 0.4146 - val_recall_6: 0.7967 - 582ms/epoch - 2ms/step
Epoch 131/150
359/359 - 1s - loss: 0.2667 - recall_6: 0.8732 - val_loss: 0.2342 - val_recall_6: 0.9137 - 575ms/epoch - 2ms/step
Epoch 132/150
359/359 - 1s - loss: 0.2651 - recall_6: 0.8732 - val_loss: 0.3472 - val_recall_6: 0.8422 - 584ms/epoch - 2ms/step
Epoch 133/150
359/359 - 1s - loss: 0.2644 - recall_6: 0.8742 - val_loss: 0.3875 - val_recall_6: 0.8171 - 577ms/epoch - 2ms/step
Epoch 134/150
359/359 - 1s - loss: 0.2637 - recall_6: 0.8732 - val_loss: 0.2387 - val_recall_6: 0.8995 - 572ms/epoch - 2ms/step
Epoch 135/150
359/359 - 1s - loss: 0.2612 - recall_6: 0.8750 - val_loss: 0.3132 - val_recall_6: 0.8603 - 587ms/epoch - 2ms/step
Epoch 136/150
359/359 - 1s - loss: 0.2608 - recall_6: 0.8751 - val_loss: 0.3652 - val_recall_6: 0.8367 - 573ms/epoch - 2ms/step
Epoch 137/150
359/359 - 1s - loss: 0.2603 - recall_6: 0.8761 - val_loss: 0.3460 - val_recall_6: 0.8438 - 596ms/epoch - 2ms/step
Epoch 138/150
359/359 - 1s - loss: 0.2591 - recall_6: 0.8771 - val_loss: 0.3256 - val_recall_6: 0.8532 - 576ms/epoch - 2ms/step
Epoch 139/150
359/359 - 1s - loss: 0.2591 - recall_6: 0.8803 - val_loss: 0.4042 - val_recall_6: 0.8093 - 592ms/epoch - 2ms/step
Epoch 140/150
359/359 - 1s - loss: 0.2571 - recall_6: 0.8775 - val_loss: 0.4010 - val_recall_6: 0.8116 - 585ms/epoch - 2ms/step
Epoch 141/150
359/359 - 1s - loss: 0.2563 - recall_6: 0.8769 - val_loss: 0.1746 - val_recall_6: 0.9396 - 583ms/epoch - 2ms/step
Epoch 142/150
359/359 - 1s - loss: 0.2561 - recall_6: 0.8799 - val_loss: 0.3912 - val_recall_6: 0.8155 - 592ms/epoch - 2ms/step
Epoch 143/150
359/359 - 1s - loss: 0.2542 - recall_6: 0.8816 - val_loss: 0.1782 - val_recall_6: 0.9349 - 564ms/epoch - 2ms/step
Epoch 144/150
359/359 - 1s - loss: 0.2548 - recall_6: 0.8777 - val_loss: 0.2179 - val_recall_6: 0.9089 - 571ms/epoch - 2ms/step
Epoch 145/150
359/359 - 1s - loss: 0.2524 - recall_6: 0.8826 - val_loss: 0.2194 - val_recall_6: 0.9152 - 561ms/epoch - 2ms/step
Epoch 146/150
359/359 - 1s - loss: 0.2508 - recall_6: 0.8812 - val_loss: 0.3289 - val_recall_6: 0.8509 - 585ms/epoch - 2ms/step
Epoch 147/150
359/359 - 1s - loss: 0.2505 - recall_6: 0.8828 - val_loss: 0.2668 - val_recall_6: 0.8854 - 589ms/epoch - 2ms/step
Epoch 148/150
359/359 - 1s - loss: 0.2494 - recall_6: 0.8832 - val_loss: 0.2152 - val_recall_6: 0.9113 - 574ms/epoch - 2ms/step
Epoch 149/150
359/359 - 1s - loss: 0.2493 - recall_6: 0.8857 - val_loss: 0.3156 - val_recall_6: 0.8532 - 576ms/epoch - 2ms/step
Epoch 150/150
359/359 - 1s - loss: 0.2491 - recall_6: 0.8877 - val_loss: 0.4076 - val_recall_6: 0.8053 - 567ms/epoch - 2ms/step
63/63 [==============================] - 0s 2ms/step - loss: 0.4930 - recall_6: 0.4988
In [80]:
plt.figure(figsize = (15, 8))
plt.plot(history_6.history['loss'])
plt.plot(history_6.history['val_loss'])
plt.title('loss vs Epochs')
plt.ylabel('loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc = 'lower right')
plt.show()
In [81]:
print(classification_report(y_test, test_pred))
cm = confusion_matrix(y_test, test_pred)
labels = np.asarray(
        [
            [
                "{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())
            ]  # flatten will reshape
            for item in cm.flatten()
        ]
    ).reshape(2, 2)


plt.figure(figsize = (8, 5))

sns.heatmap(cm, annot = labels,  fmt = '', xticklabels = ['Not Exited', 'Exited'], yticklabels = ['Not Exited', 'Exited'])

plt.ylabel('Actual')

plt.xlabel('Predicted')

plt.show()
              precision    recall  f1-score   support

           0       0.88      0.90      0.89      1595
           1       0.56      0.50      0.53       405

    accuracy                           0.82      2000
   macro avg       0.72      0.70      0.71      2000
weighted avg       0.81      0.82      0.82      2000

  • The results are not an improvement on the best result we obtained earlier. Hence we stick to that model as our best model.
  • The confusion matrix shows that the model can identify half of the customers who will churn. This is not an exemplery performance and attempts should be made to improve this further.
  • SMOTE oversampling or undersampling may be able to improve the performance. Our attempt at this did not yield good results, however.
  • The classification report shows that all the metrics except recall/f1-score of class label 1 are above 80%, which is good.

Conclusion¶

In this case study,

  • We have learned how to build a feed-forward neural network for a classification task using Keras.
  • We have seen different hyper-parameters and how they affect the network.
  • We have also learned about the Loss vs epoch curve and how it helps to understand the model is learning the weights.
  • We were able to get the test accuracy of >80% using the final model. However, class 1 recall was 50% only.
  • Further analysis can be performed on the misclassified points and see if there is a pattern or if they were outliers that our model could not identify.
  • Other hyper-parameters can be played around with to see how it affects the model performance.

Business Recommendations¶

  • Age is higher for those customers that have exited. Proactively seek such customers and check if they are having difficulty with products or technology.
  • Balance is higher for customers that exited. This segment of customers may be savvy and hence the bank should proactively offer them better interest rates.
  • Germany has more exit percentage. Study the local branch performance to investigate this.
  • Female customers having exited more than male customers. Investigate this more. Perhaps more relevant products have to be introduced to female customers.
  • There is a higher exit rate for inactive customers. Proactively seek such customers to revive their interest and interactions with the Bank.